Efficient  BDD-Based  Planning  for 
Non-Deterministic,  Fault-Tolerant,  and 
Adversarial  Domains 

Rune  M0ller  Jensen 

June  2003 

CMU-CS-03-139 


School  of  Computer  Science 
Carnegie  Mellon  University 
Pittsburgh,  PA  15213 

Submitted  in  partial  fulfillment  of  the  requirements 
for  the  degree  of  Doctor  of  Philosophy. 


Thesis  Committee: 

Manuela  M.  Veloso  (Co-Chair),  Carnegie  Mellon  University 
Randal  E.  Bryant  (Co-Chair),  Carnegie  Mellon  University 
Reid  Simmons,  Carnegie  Mellon  University 
Paolo  Traverso,  IRST,  Trento,  Italy 


Copyright  ©  2003  Rune  Mollcr  Jensen 


This  research  was  supported  in  part  by  the  Danish  Ministry  of  Science,  Technology  and  Innovation  and  the 
United  States  Air  Force  under  Grants  Nos  F30602-00-2-0549  and  F30602-98-2-0135. 

The  views  and  conclusions  contained  in  this  document  are  those  of  the  author  and  should  not  be  interpreted 
as  necessarily  representing  the  official  policies  or  endorsements,  either  expressed  or  implied,  by  the  Danish 
Government,  the  United  States  Air  Force,  or  the  US  Government. 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

JUN  2003  2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2003  to  00-00-2003 

4.  TITLE  AND  SUBTITLE 

Efficient  BDD-Based  Planning  for  Non-Deterministic,  Fault-Tolerant, 
and  Adversarial  Domains 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Carnegie  Mellon  University, School  of  Computer 

Science, Pittsburgh, PA, 15213 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

221 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Keywords:  Automated  Planning,  Heuristic  Search,  Binary  Decision  Diagrams,  Artifi¬ 
cial  Intelligence,  Symbolic  Model  Checking,  Controller  Synthesis. 


To  my  grandparents 


iv 


Abstract 


Automated  planning  considers  selecting  and  sequencing  actions  in  order 
to  change  the  state  of  a  discrete  system  from  some  initial  state  to  some  goal 
state.  This  problem  is  fundamental  in  a  wide  range  of  industrial  and  academic 
fields  including  robotics,  automation,  embedded  systems,  and  operational  re¬ 
search.  Planning  with  non-deterministic  actions  can  be  used  to  model  dynamic 
environments  and  alternative  action  behavior.  One  of  the  currently  best  known 
approaches  is  to  employ  reduced  ordered  Binary  Decision  Diagrams  (BDDs) 
to  represent  and  generate  plans  using  techniques  developed  in  symbolic  model 
checking.  However,  the  approach  is  challenged  by  a  frequent  blow-up  of  the 
BDDs  representing  the  search  frontier  and  a  limited  number  of  solution  classes. 

This  thesis  addresses  both  of  these  problems.  With  respect  to  the  first, 
it  contributes  a  general  framework  called  state-set  branching  that  seamlessly 
combines  classical  heuristic  search  and  BDD-based  search.  Our  experimen¬ 
tal  results  show  that  the  performance  of  state-set  branching  often  dominates 
both  blind  BDD-based  search  and  ordinary  heuristic  search.  In  addition,  it 
consistently  outperforms  any  previous  approach,  we  are  aware  of,  to  guide  a 
BDD-based  search.  We  show  that  state-set  branching  naturally  generalizes  to 
non-deterministic  planning  and  introduce  heuristically  guided  versions  of  the 
current  BDD-based  non-deterministic  planning  algorithms. 

With  respect  to  the  second  problem,  the  thesis  introduces  two  frameworks 
called  fault  tolerant  planning  and  adversarial  planning.  Fault  tolerant  plan¬ 
ning  addresses  domains  where  non-determinism  is  caused  by  rare  errors.  The 
current  solution  classes  handle  this  situation  poorly  by  taking  all  fault  combi¬ 
nations  into  account  or  produce  too  weak  solutions.  The  thesis  contributes  a 
new  class  of  solutions  called  fault  tolerant  plans  that  are  robust  to  a  limited 
number  of  faults.  In  addition,  it  introduces  specialized  BDD-based  algorithms 
for  synthesizing  fault  tolerant  plans. 

Adversarial  planning  considers  situations  where  non-determinism  is  caused 
by  uncontrollable,  but  known,  environment  actions.  The  current  solution  clas¬ 
ses  of  BDD-based  non-deterministic  planning  assume  a  “friendly”  environ¬ 
ment  and  may  never  reach  a  goal  state  if  the  environment  is  hostile  and  in¬ 
formed.  The  thesis  contributes  efficient  BDD-based  algorithms  for  synthesiz¬ 
ing  winning  strategies  for  such  problems. 
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Chapter  1 
Introduction 


Planning  is  a  fundamental  aspect  of  human  activity.  It  considers  how  to  select  and  sequence 
actions  in  order  to  achieve  specific  goals.  This  problem  arises  in  a  wide  range  of  situations. 
For  instance,  we  need  to  select  and  sequence  rotations  very  carefully  in  order  to  change 
Rubik’s  Cube  from  some  initial  configuration  to  its  goal  configuration  where  each  side  has 
identically  colored  tiles.  Planning  is  also  needed  to  control  a  power  plant  in  order  to  recover 
from  any  possible  failure  of  the  plant.  This  problem  is  fairly  different  from  solving  Rubik’s 
Cube  since  the  control  actions  depend  on  an  interacting  environment.  In  general,  we  can 
divide  planning  problems  into  two  main  categories:  deterministic  planning  problems  and 
non-deterministic  planning  problems.  Rubik’s  Cube  is  a  good  example  of  a  deterministic 
planning  problem.  Each  action  has  only  one  possible  outcome.  In  other  words,  actions  are 
deterministic.  This  is  not  the  case  for  non-deterministic  planning  problems.  Consider  the 
power  plant  problem.  Due  to  failures,  actions  may  either  succeed  or  fail.  Thus,  several 
outcomes  of  actions  are  possible.  In  general,  dynamic  environments  cause  actions  to  be 
non-deterministic  since  we  are  unable  to  determine  their  exact  effect. 

Planning  problems  are  often  very  hard  to  solve  in  practice.  There  are  several  reasons 
for  this.  First,  real-world  domains  tend  to  be  extremely  large.  Assume  for  instance  that  the 
power  plant  described  above  consists  of  n  units  that  each  can  be  in  at  least  two  different 
states.  The  total  number  of  states  of  the  power  plant  is  then  at  least  2n.  Thus,  the  state 
space  of  the  power  plant  grows  exponentially  with  the  number  of  units.  This  is  a  common 
problem  for  real-world  domains  and  has  been  termed  the  state  space  explosion  problem. 
Second,  plans  for  real-world  problems  are  often  long.  A  plan  for  controlling  a  power 
plant  or  loading  a  container  ship  may  involve  sequencing  thousands  of  actions.  Third, 
the  combinatorial  complexity  of  planning  may  be  high.  The  Rubik’s  Cube  is  a  hard  puzzle 
because  it  is  impossible  to  move  one  tile  without  affecting  the  position  of  several  other 
tiles.  Similarly,  routing  wires  between  units  on  an  integrated  circuit  is  complicated  because 
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routing  one  wire  strongly  constrains  how  the  remaining  wires  can  be  routed.  Finally,  fourth, 
a  planning  problem  may  not  only  be  to  generate  a  valid  plan  but  an  optimal  one.  We  may 
want  to  minimize  the  energy  consumption  or  time  used  to  load  a  container  ship  or  we  may 
want  to  use  a  minimum  number  of  connections  between  layers  (vias)  when  routing  the 
wires  of  an  integrated  circuit.  Such  optimal  plans  can  be  much  harder  to  find  than  just  valid 
ones. 

Automated  planning  is  a  subfield  of  Artificial  Intelligence  (AI)  concerned  with  how  to 
generate  plans  automatically.  The  traditional  approach  is  to  make  a  discrete  abstraction  of 
the  real-world  domain  and  search  for  a  solution.  More  formally,  a  planning  domain  consists 
of  a  finite  set  of  states,  a  finite  set  of  actions,  and  a  transition  relation  defining  the  effect  of 
actions.  A  planning  problem  consists  of  a  planning  domain,  an  initial  state,1  and  a  set  of 
goal  states.  For  a  deterministic  planning  problem,  a  plan  is  a  sequence  of  actions  forming  a 
path  leading  from  the  initial  state  to  one  of  the  goal  states.  For  a  non-determini  Stic  planning 
problem  where  actions  may  lead  to  several  possible  next  states,  a  plan  may  be  defined  as 
a  function  associating  states  with  relevant  actions  to  apply  at  the  state  in  order  to  reach 
a  goal  state.  Planning  domains  and  planning  problems  are  traditionally  described  in  a 
planning  language  that  uses  propositional  or  first  order  logic  to  define  actions  and  states. 
This  encoding  scheme  causes  the  formal  complexity  of  planning  to  be  PSPACE-complete 
[29], 

The  primary  challenge  of  automated  planning  is  to  provide  efficient  algorithms  and  data 
structures  to  represent  and  synthesize  plans  that  scale  to  real-world  problems.  During  its 
more  than  40  years  of  existence,  2  the  field  has  contributed  a  vast  number  of  effective  search 
techniques  including  means-end  analysis  [126],  hierarchical  abstraction  [148],  partial-order 
planning  [149],  case-based  planning  [72,  164],  graph-planning  [18],  heuristic  search  [75, 
20],  and  planning  and  learning  [163].  In  addition,  there  has  recently  been  successful  work 
on  reducing  planning  to  satisfiability  [99],  model  checking  [33],  and  integer  programming 
[19].  In  recent  years,  the  efficiency  of  planning  systems  has  grown  considerably.  However, 
scaling  to  moderately  large  real-world  problems  is  still  an  open  problem. 

The  secondary  challenge  of  automated  planning  is  to  provide  planning  algorithms  that 
can  handle  essential  properties  of  real-world  domains  such  as  time,  dynamic  environments, 
partial  observability  of  states,  and  concurrency.  The  challenge  is  not  simply  to  extend  the 
expressiveness  of  planning  languages  and  generalize  algorithms  to  manage  all  these  proper¬ 
ties,  but  instead  to  carefully  develop  new  representations  and  algorithms  with  an  attractive 
trade-off  between  expressiveness  and  scalability.  Non-deterministic  planning  is  an  example 
of  such  a  positive  trade-off.  Non-determinism  can  model  essential  properties  of  dynamic 

'if  the  initial  state  is  uncertain,  we  may  represent  it  by  a  set  of  states. 

2The  General  Problem  Solver  (GPS)  [126]  is  widely  considered  the  first  automated  problem  solver. 
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domains  such  as  uncontrollable  actions  and  alternative  action  behavior  without  introduc¬ 
ing  computational  expensive  elements  in  the  model  like  continuous  time  and  probability 
distributions. 

Recently,  efficient  algorithms  using  reduced  ordered  Binary  Decision  Diagrams  (BDDs) 
[26]  to  represent  plans  and  applying  implicit  search  techniques  developed  for  symbolic 
model  checking  [119]  have  been  shown  to  outperform  a  wide  range  of  the  previous  ap¬ 
proaches  to  non-deterministic  planning  [34].  However,  a  major  challenge  of  this  line  of 
research  is  that  non-deterministic  domain  models  often  are  coarse  abstractions  that  make  it 
hard  to  define  strong  solution  models. 

To  summarize,  planning  is  about  selecting  and  sequencing  actions  in  order  to  obtain 
specific  goals.  In  general,  planning  problems  can  be  divided  according  to  the  environment 
which  either  can  be  non-interacting  or  interacting.  In  both  cases,  most  real-world  planning 
problems  are  very  hard  and  the  primary  goal  of  automated  planning  is  to  provide  efficient 
algorithms  and  data  structures  that  scale  to  real-world  applications.  A  secondary  goal  is 
to  develop  planning  systems  that  handle  essential  properties  of  real-world  domains  such 
as  dynamic  environments  and  time.  An  interesting  recent  development  in  this  direction  is 
BDD-based  non-deterministic  planning.  However,  the  range  of  solution  classes  developed 
within  this  framework  is  still  limited. 


1.1  Approach 

The  overall  objective  of  the  thesis  is  to  contribute  efficient  algorithms  for  non-deterministic 
planning.  We  consider  the  universal  planning  [154]  approach  to  non-deterministic  plan¬ 
ning.  In  universal  planning,  actions  are  assumed  to  be  non-deterministic  in  the  sense  that 
they  may  have  several  possible  outcomes.  States,  on  the  other  hand,  are  assumed  to  be 
fully  observable.  A  universal  plan  is  a  function  mapping  states  to  sets  of  relevant  actions 
to  apply  in  order  to  reach  a  goal  state.  It  is  executed  by  iteratively  observing  the  current 
state  and  applying  one  of  the  actions  in  the  plan  associated  with  that  state.  From  a  control 
theoretic  point  of  view,  universal  planning  corresponds  to  automated  controller  synthesis 
of  discrete,  untimed,  and  memory  less  controllers. 

A  main  assumption  of  the  thesis  is  that  the  computational  advantages  of  non-determinis¬ 
tic  domain  models  outweigh  the  problems  of  defining  practically  useful  solution  classes  due 
to  their  limited  expressive  power.  We  believe  that  the  absence  of  continuous  elements,  like 
time  and  probability  distributions  may  be  essential  for  developing  algorithms  that  scale 
to  real-world  problems.  In  addition,  we  trust  that  the  limitation  of  the  expressive  power 
of  non-deterministic  abstractions  can  be  overcome  by  adding  further  information  to  the 
model  identifying  different  sources  of  non-determinism  (e.g.,  by  distinguishing  between 
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successful  and  failure  outcomes  of  actions). 

The  thesis  relies  on  the  efficiency  of  BDDs  to  represent  and  generate  non-de  termini  Stic 
plans.  A  BDD  is  a  rooted  Directed  Acyclic  Graph  (DAG)  representing  a  Boolean  function. 
Its  main  advantage  is  that  the  number  of  nodes  in  the  BDD  graph  often  is  much  smaller 
than  the  number  of  truth  assignments  of  the  Boolean  function  it  represents.  State-of-the-art 
BDD-based  non-deterministic  planning  algorithms  iteratively  construct  a  BDD  represent¬ 
ing  the  plan.  This  is  done  by  an  implicit  breadth-first  backward  search  from  the  goal  states 
to  the  initial  state  carried  out  entirely  with  BDDs.  Due  to  the  compactness  of  BDDs,  this  ap¬ 
proach  may  reduce  both  the  time  and  space  complexity  exponentially  compared  to  explicit 
search  techniques. 

The  current  approaches  to  BDD-based  non-deterministic  planning  face  two  major  chal¬ 
lenges.  The  first  is  that  BDD-based  non-deterministic  planning  despite,  its  unprecedented 
efficiency,  still  does  not  scale  to  large  real-world  problems.  Often  the  BDDs  representing 
the  backward  breadth-first  search  frontier  blow  up  [86].  This  tendency  seems  to  be  worse 
for  typical  planning  problems  compared  to  typical  model  checking  problems.  One  reason 
for  this  might  be  that  planning  domains  often  represent  hard  combinatorial  problems  such 
as  channel  routing  in  VLSI  design,  whereas  model  checking  benchmarks  often  are  indus¬ 
trial  cases  with  no  particular  intention  of  being  combinatorially  hard.  Another  difference 
is  the  diameter  of  the  finite  transition  graphs  representing  the  planning  domain.  Planning 
problems  are  deliberately  designed  to  have  transition  graphs  with  large  diameters  causing 
plan  solutions  to  be  long.  Again,  model  checking  benchmarks  are  not  particularly  chosen 
to  fulfill  this  requirement. 

The  second  challenge  of  BDD-based  non-deterministic  planning  is  that  the  limited  ex¬ 
pressive  power  of  the  non-deterministic  abstraction  makes  it  hard  to  define  solution  classes 
that  are  useful  in  practice.  Non-deterministic  domain  models  often  hide  too  much  informa¬ 
tion  about  the  source  of  non-determinism  to  allow  useful  solutions  models  to  be  defined. 

Given  these  challenges,  the  goal  of  the  thesis  is  to  answer  two  questions 

1.  Can  the  computational  efficiency  of  BDD-based  non-deterministic  planning  be  im¬ 
proved  ? 

2.  Is  it  possible  to  improve  the  current  solution  classes  for  BDD-based  non-deterministic 
planning  ? 

The  thesis  addresses  the  first  question  by  introducing  a  seamless  combination  of  BDD- 
based  search  and  heuristic  search.  1  The  advantage  of  heuristic  search  algorithms  such  as 

*  We  will  use  the  terms  heuristic  search,  guided  search,  directed  search,  and  informed  search  interchange¬ 
ably. 
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pure  heuristic  search  and  A*  [75]  compared  to  uninformed  or  blind  search  algorithms  such 
as  depth-first  search  and  breadth-first  search,  is  that  they  use  heuristics  to  prioritize  the  node 
expansion  in  the  search  tree  and  in  this  way  guide  the  search  toward  a  solution.  In  most 
cases,  the  number  of  states  of  a  guided  search  frontier  grows  slower  with  the  search  depth 
than  the  number  of  states  of  an  unguided  search  frontier.  We  therefore  expect  that  the  size  of 
BDDs  representing  a  guided  search  frontier  grow  slower  than  the  size  of  BDDs  representing 
a  blind  search  frontier.  Several  attempts  have  been  made  to  implement  BDD-based  versions 
of  these  algorithms.  They  have,  however,  either  been  inefficient  [53,  74,  180]  or  too  narrow 
in  scope  [179].  The  approach  introduced  in  the  thesis  is  called  state-set  branching.  The 
philosophy  of  state-set  branching  is  that  the  information  represented  by  BDDs  must  be 
semantically  closely  related  in  order  for  the  BDD  operations  to  work  efficiently.  In  contrast 
to  previous  work,  state-set  branching  avoids  arithmetic  computations  at  the  BDD  level  in 
each  iteration  of  the  search  algorithm.  Instead,  these  computations  are  integrated  in  the 
BDD  operation  computing  the  search  frontier.  State-set  branching  is  general.  It  applies 
to  any  heuristic  function,  any  evaluation  function,  and  any  transition  cost  function.  In 
addition,  state-set  branching  extends  beyond  classical  heuristic  search  and  deterministic 
planning.  In  its  non-deterministic  version  called  non-deterministic  state-set  branching , 
it  can  be  used  to  dramatically  improve  the  performance  of  non-deterministic  BDD-based 
planning  algorithms  not  only  in  terms  of  computational  efficiency  but  also  in  terms  of  the 
size  of  the  produced  plans. 

The  thesis  addresses  the  second  thesis  question  by  introducing  two  extensions  of  the 
ordinary  non-deterministic  domain  model.  The  first  extension  is  based  on  the  key  observa¬ 
tion  that  non-determinism  in  real-world  domains  often  is  caused  by  infrequent  errors  that 
make  otherwise  deterministic  actions  fail.  In  many  cases,  no  actions  can  be  guaranteed 
to  succeed.  For  such  problems  plans  taking  all  combinations  of  faults  into  account  seldom 
exists.  The  approach  introduced  in  the  thesis  is  called  fault  tolerant  planning.  Fault  tolerant 
plans  are  robust  to  a  limited  number  of  faults  happening  during  execution. 

The  second  extension  considers  situations  where  the  main  source  of  non-determinism 
is  uncontrollable  actions  selected  by  a  possibly  hostile  environment.  By  extending  the 
ordinary  non-deterministic  domain  model  to  explicitly  represent  environment  actions,  it 
is  possible  to  reason  about  the  actions  of  the  environment  during  planning.  The  approach 
introduced  in  the  thesis  is  called  adversarial  planning.  The  key  idea  is  to  prune  unfair  states 
from  the  plans  where  the  environment  has  an  action  for  which  no  counter  action  exists  that 
may  cause  progress  toward  the  goal  states. 
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1.2  Thesis  Contributions 

The  thesis  has  five  major  contributions: 

1.  State-Set  Branching 

State-set  branching  appears  to  be  the  currently  most  general  and  most  computation¬ 
ally  efficient  framework  for  combining  classical  heuristic  search  and  BDD-based 
search.  It  applies  to  any  best-first  search  algorithm,  any  heuristic  function,  any  eval¬ 
uation  function,  and  any  transition  cost  function.  A  state-set  branching  implementa¬ 
tion  of  A*  often  dominates  both  the  ordinary  explicit- state  implementation  of  A*  and 
blind  BDD-based  search. 

2.  Non-Deterministic  State-Set  Branching 

Non-deterministic  state-set  branching  is,  as  far  as  we  know,  the  first  framework  for 
guiding  BDD-based  non-deterministic  planning  algorithms.4  Even  for  fairly  weak 
heuristics,  extensive  performance  improvements  over  the  current  non-deterministic 
BDD-based  planning  algorithms  can  be  obtained  not  only  in  terms  of  computation 
speed  but  also  in  terms  of  the  size  of  the  produced  plans. 

3.  Fault  Tolerant  Planning 

To  our  knowledge,  the  fault  tolerant  planning  algorithms  introduced  in  the  thesis 
are  the  first  algorithms  to  synthesize  fault  tolerant  control  strategies  given  a  domain 
description  that  explicitly  represents  successful  and  failure  effects  of  actions. 

4.  Adversarial  Planning 

Adversarial  planning  is,  as  far  as  we  know,  the  first  work  that  studies  fully  imple¬ 
mented  and  complete  symbolic  algorithms  for  synthesizing  strategies  for  winning 
concurrent  reachability  games  with  probability  1  or  positive  probability.  To  our 
knowledge,  it  also  is  the  first  work  that  provides  such  algorithms  in  a  format  that 
enables  guided  search  techniques  to  be  applied. 

5.  NADL+ 

NADL+  is  an  extension  of  NADL  [93].  To  our  knowledge,  it  is  the  first  represen¬ 
tation  language  suitable  for  planning  that  both  explicitly  represents  uncontrollable 
environment  actions  and  failure  effects  of  actions. 

As  described  in  the  previous  section,  these  contributions  are  along  two  orthogonal 
axises:  computational  efficiency  and  solution  quality.  State-set  branching  falls  on  the  first 

4By  non-deterministic  planning,  we  refer  to  the  definition  given  in  Section  3.2. 
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axis,  Fault  tolerant  planning  and  adversarial  planning  mainly  fall  on  the  second.  In  addi¬ 
tion  to  this  work,  the  thesis  contributes  formal  correctness  and  optimality  proofs  for  algo¬ 
rithms  where  these  properties  are  nontrivial.  Moreover,  the  thesis  provides  the  Bdd-based 
InFoRmed planning  and  controller  Synthesis  Tool  (BIFROST)  for  solving  search  and  plan¬ 
ning  problems  described  in  PDDL  [118]  and  NADL+.  BIFROST  is  fully  implemented  and 
currently  includes  8  deterministic  and  10  non-deterministic  planning  and  search  algorithms. 


1.3  Document  Outline 

The  remainder  of  the  thesis  is  organized  as  follows.  Chapter  2  presents  background  mate¬ 
rial  including  BDDs,  symbolic  model  checking,  and  heuristic  search.  Chapter  3  presents 
basic  techniques  for  encoding  deterministic  STRIPS  [58]  and  non-deterministic  NADL 
and  NADL+  planning  problems  with  BDDs  and  presents  unguided  BDD-based  search 
algorithms  for  synthesizing  deterministic  and  non-deterministic  plans.  Chapter  4  intro¬ 
duces  state-set  branching.  It  is  shown  how  the  framework  can  be  used  to  implement  algo¬ 
rithms  for  classical  heuristic  search  and  deterministic  planning.  Chapter  5  introduces  non- 
deterministic  state-set  branching  and  describes  guided  versions  of  the  blind  BDD-based 
non-deterministic  planning  algorithms.  Chapter  6  defines  fault  tolerant  planning  and  in¬ 
troduces  both  blind  and  guided  BDD-based  algorithms  for  generating  fault  tolerant  plans. 
Chapter  7  presents  adversarial  planning  and  describes  two  BDD-based  adversarial  planning 
algorithms.  Finally,  Chapter  8  discusses  related  work,  and  Chapter  9  presents  conclusions 
and  future  work.  BIFROST  is  described  in  Appendix  A  and  correctness  and  optimality 
proofs  are  given  in  Appendix  B. 
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Chapter  2 
Background 


This  chapter  presents  a  range  of  formalisms  and  logics  used  in  the  thesis  and  gives  an 
introduction  to  symbolic  model  checking  and  classical  heuristic  search.  Section  2.1  de¬ 
scribes  Quantified  Boolean  Formulas  (QBF),  Kripke  structures,  and  Computation  Tree 
Logic  (CTL).  Section  2.2  presents  the  BDD  data  structure  and  modern  BDD  packages.  Sec¬ 
tion  2.3  describes  symbolic  model  checking.  It  shows  how  BDDs  can  be  used  to  represent 
and  search  a  finite  transition  system  and  describes  a  technique  known  as  transition  relation 
partitioning  used  to  lower  the  complexity  of  BDD-based  search.  Finally,  Section  2.4  gives 
a  brief  introduction  to  classical  heuristic  search. 


2.1  Logics  and  Formalisms 

This  section  presents  Quantified  Boolean  Formulas  (QBF)  [1],  Kripke  structures  [111],  and 
the  Computation  Tree  Logic  (CTL)  [10,  55].  QBF  provides  a  concise  notation  for  complex 
operations  on  Boolean  formulas  which  we  will  use  extensively  to  define  BDD  operations. 
Kripke  structures  and  CTL  are  basic  tools  for  specifying  behavior  of  non-deterministic 
systems  [40,  42].  We  will  use  them  to  define  various  classes  of  non- de termini- Stic  plans. 

2.1.1  Quantified  Boolean  F ormulas 

Quantified  Boolean  Formulas  (QBF)  is  ordinary  propositional  logic  extended  with  quan¬ 
tification  of  Boolean  variables. 

Definition  2.1  (QBF  syntax)  Given  a  set  V  =  { v  \ ,  •  •  • ,  vn }  of  propositional  variables, 
QBF(V )  formulas  are  inductively  defined  by 
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•  every  variable  in  V  is  a  formula, 

•  if  f  and  g  are  formulas,  then  so  are  -if,  f  A  g,  and  f  V  g,  and 

•  if  f  is  a  formula  and  v  E  V,  then  3v  .  f  and  V?; .  f  are  formulas. 


A  truth  assignment  for  QBF(V)  is  a  function  a  :  V  -E  B.  We  will  use  the  notation 
cr{v  4—  a)  for  the  truth  assignment  defined  by 


a(v  4—  a)(w) 


l 


a 

a(w) 


if  v  =  w 
otherwise. 


Definition  2.2  (QBF  Semantics)  If  f  is  a  formula  in  QBF(V )  and  a  is  a  truth  assignment, 
we  will  write  o  —  f  to  denote  that  f  is  true  under  the  assignment  a.  The  relation  —  is 
defined  inductively  in  the  obvious  manner 


a  \=  v  ijfo(v)  =  true, 

o  \=  -■/  iffa  f> 

V  [ =  fv  9  iffv  h  /  °rv  h  g, 

o  (=  /  A  g  iffa  (=  /  and  a  )=  g, 
a  \=  3v  .  f  iffa(v  4—  false)  \=  f  or  a(v  E-  true)  \=  f, 
a  \ =Vv .  f  ijfa(v  4—  false)  |=  /  and  a(v  E-  true)  \=  f. 

For  a  vector  v  =  («i,  ■  ■  ■ ,  vrn)  of  propositional  variables  in  V,  we  define  the  abbreviations 

3 v.f  =  3^i .  (•  •  •  (3vm  ./)•••)  (2.1) 

Vv.f  =  VWl .  (■  ■  ■  (Vvm  .  /)•••)•  (2.2) 

The  support  of  a  formula  /  is  the  set  of  variables  that  /  depends  on  {n  G  f  :  f\v^true  3= 

f\v<r-  false}  ■ 


2.1.2  Kripke  Structures 

A  Kripke  structure  [111,  40]  is  finite  state  transition  graph  that  can  be  used  to  capture  the 
intuition  about  the  behavior  of  a  finite  transition  system.  The  standard  definition  of  a  Kripke 
structure  is  a  set  of  states,  a  set  of  transitions  between  states,  and  a  function  that  labels  each 
state  with  a  set  of  propositions  that  are  true  in  this  state.  For  the  purpose  of  this  thesis, 
however,  we  consider  a  simplified  version  of  the  standard  definition  without  propositions.1 
'Similar  restrictions  have  been  used  in  model  checking  [39]. 
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Definition  2.3  (Kripke  Structure)  A  Kripke  structure  K,  is  a  pair  JC  =  (S.  R)  where  S  is 
a  finite  set  of  states  and  R  C  S  x  S  is  a  total  transition  relation.2 

A  state  and  a  transition  of  a  Kripke  structure  denote  a  state  and  a  possible  state  change  of 
the  finite  transition  system  the  Kripke  structure  represents.  A  path  in  a  Kripke  structure 
represents  an  execution  of  the  system.  A  path  tt  in  K.  is  an  infinite  sequence  •  •  •  of 
states  in  S  such  that,  for  i  >  0,  (sj,  Sj+i)  G  R. 

Example  2.1  A  Kripke  structure  with  four  states  S  =  {A,  B.  C.  D}  and  five  transitions 
R  =  {(A,B),(B,D),  (C,  A),  (C,  D),  (D,  C)}  is  shown  in  Figure  2.1.  0 

© - -® 


Figure  2. 1 :  A  Kripke  structure  with  four  states  and  five  transitions. 


2.1.3  Computation  Tree  Logic 

Computation  Tree  Logic  (CTL)  [10,  55]  is  a  branching-time  temporal  logic  to  specify  the 
behavior  of  a  system  represented  by  a  Kripke  structure.  In  branching-time  temporal  logics, 
the  underlying  structure  of  time  is  assumed  to  have  a  branching  tree-like  nature  where  each 
moment  may  have  many  successor  moments.  For  a  Kripke  structure,  this  execution  tree  is 
formed  by  designating  a  state  in  the  Kripke  structure  as  an  initial  state  and  then  unwinding 
the  structure  into  an  infinite  tree  with  the  designated  state  as  root.  Each  path  in  the  tree  is 
a  path  in  the  Kripke  structure  and  represents  a  possible  execution  of  the  system  the  Kripke 
structure  models. 

Example  2.2  The  execution  tree  starting  in  state  C  of  the  Kripke  structure  shown  in  Fig¬ 
ure  2.1  is  illustrated  in  Figure  2.2.  0 

We  consider  a  small  subset  of  CTL  formulas  sufficient  for  our  purposes.  CTL  formulas 
are  composed  of  path  quantifiers  and  temporal  operators.  The  path  quantifiers  are  used 
to  describe  the  branching  structure  in  the  execution  tree.  There  are  two  such  quantifiers  A 
2  A  transition  relation  is  total  iff  Vs  .  3s' .  (s,  s')  €  R. 
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Figure  2.2:  The  execution  tree  produced  from  state  C  of  the  Kripke  structure  shown 
in  Figure  2.1. 

(“for  all  execution  paths”)  and  E  (“for  some  execution  path”).  These  quantifiers  are  used 
in  a  particular  state  to  specify  that  all  of  the  paths  or  some  of  the  paths  starting  at  that  state 
have  some  property.  The  temporal  operators  describe  properties  of  a  path  through  the  tree. 
We  consider  one  of  these  U  (“until”).  It  is  used  to  combine  two  properties  <p  and  ip.  It  holds 
if  there  is  a  state  on  the  path  where  ip  holds,  and  at  every  preceding  state  on  the  path,  <p 
holds. 

Definition  2.4  (CTL  Syntax)  Given  a  finite  set  of  states  S,  the  syntax  of  CTL  formulas  are 
inductively  defined  as  follows 

•  Each  element  of  2s  is  a  formula, 

•  -<ip,  EidXJ'tp),  and  k(cpX5ip)  are  formulas  iff  and  ip  are. 

CTL  semantics  is  given  with  respect  to  Kripke  structures.  In  the  following  inductive  defi¬ 
nition  of  the  semantics  of  CTL,  /C,  q  \  =  ip  denotes  that  ip  holds  in  the  state  q  of  the  Kripke 
structure  K,. 

Definition  2.5  (CTL  Semantics)  Given  a  Kripke  structure  1C  =  (S,  R),  the  semantics  of 
CTL  formulas  are  inductively  defined  as  follows 

•  %  |=  P  iff  %  e  P, 

•  K,,q0  |=  ~np  iff  /C,  qo  ft  ^ 

•  /C.  qo  ft  E((pUip)  iff  there  exists  a  path  q0qi  ■  ■  ■  and  i  >  0  such  that  K.  q,  ft  ip  and, 
for  all  0  <  j  <  i,  1C.  q3  =  <p, 

•  K.  qo  =  hifUip)  iff  for  all  paths  q0qi  ■  •  •  there  exists  i  >  0  such  that  /C,  q,  ft  ip  and, 
for  all  0  <  j  <  i,  K.,  q3  f=  <p. 
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We  will  use  three  abbreviations 


AF  ip 

=  A(SU^), 

(2.3) 

EF  ip 

=  E(S\Jip), 

(2.4) 

kGip 

=  ~^EF~iip. 

(2.5) 

F  stands  for  “future”  or  “eventually”.  Since  S  is  the  complete  set  of  states  in  the  Kripke 
structure,  the  CTL  formula  S  holds  in  any  state.  Thus,  AF  ip  means  that  for  all  execution 
paths,  a  state  where  ip  holds  will  eventually  be  reached.  Similarly,  EF  ip  means  that  there 
exists  an  execution  path  reaching  a  state  where  ip  holds.  G  stands  for  “globally”  or  “always”. 
The  formula  AG  ip  holds,  if  every  state  on  any  execution  path  satisfies  ip. 

Example  2.3  Let  K,  denote  the  Kripke  structure  shown  in  Figure  2.1.  Since  four  of  the 
transitions  form  a  cycle  CAB  DC,  we  have  that  from  any  state  visited  on  an  execution  path 
produced  from  C ,  C  can  be  reached.  Thus,  K.,  C  |=  AGEF  {C}.  0 

We  will  often  consider  CTL  formulas  on  execution  trees  produced  from  a  set  of  states.  To 
simplify  the  presentation,  we  therefore  introduce  the  short  notation 

/C,  Q\=ip  =  Vq  €  Q  .  1C,  q  |=  ip.  (2.6) 

2.2  Binary  Decision  Diagrams 

A  reduced  ordered  Binary  Decision  Diagram  (BDD)  is  rooted  Directed  Acyclic  Graph 
(DAG)  representing  a  Boolean  function  on  a  set  of  linearly  ordered  Boolean  variables. 
It  has  one  or  two  terminal  nodes  labeled  1  or  0,  and  a  set  of  variable  nodes.  Each  variable 
node  is  associated  with  a  Boolean  variable  and  has  two  outgoing  edges  low  and  high.  Given 
an  assignment  of  the  variables,  the  value  of  the  Boolean  function  is  determined  by  a  path 
starting  at  the  root  node  and  recursively  following  the  high  edge,  if  the  associated  variable 
is  true,  and  the  low  edge,  if  the  associated  variable  is  false.  The  function  value  is  true,  if 
the  label  of  the  reached  terminal  node  is  1;  otherwise  it  is  false.  The  graph  is  ordered  such 
that  all  paths  in  the  graph  respect  the  ordering  of  the  variables. 

Example  2.4  A  BDD  representing  the  function  f(x i,  x2)  =  x\  V  -*xi  A  ~^x2  is  shown  in 
Figure  2.3.  0 

A  BDD  is  reduced  such  that  no  two  distinct  nodes  u  and  v  have  the  same  variable  name 
and  low  and  high  successors  (Figure  2.4a),  and  no  variable  node  u  has  identical  low  and 
high  successors  (Figure  2.4b).  Due  to  these  reductions,  the  number  of  nodes  in  a  BDD  of 
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Figure  2.3:  A  BDD  representing  the  function  /(x i,  £2)  —  %i  V  -1X1  A  -1X2-  High 
and  low  edges  are  drawn  with  solid  and  dashed  lines,  respectively. 


a  regularly  structured  function  is  often  much  smaller  than  the  number  of  truth  assignments 
of  the  function.  In  particular,  it  can  be  shown  that  the  size  of  a  BDD  representing  any 
symmetric  function  only  grows  polynomially  with  the  number  of  variables  of  the  function 
[26], 

Definition  2.6  A  Boolean  function  /  e  Bn  is  called  symmetric  if  each  permutation  p  of  the 
variables  does  not  change  the  function  value,  i.  e. , 

f{x lj  '  '  '  5  xn)  —  '  '  '  5  xp(n))’ 


Another  advantage  is  that  the  reductions  make  BDDs  canonical.  Large  space  savings  can 
be  obtained  by  representing  a  collection  of  BDDs  in  a  single  multirooted  graph  where  the 
subgraphs  of  the  BDDs  are  shared.  Due  to  the  canonicity,  two  BDDs  are  identical  if  and 
only  if  they  have  the  same  root.  Consequently,  when  using  this  representation,  equivalence 
check  between  two  BDDs  can  be  performed  in  constant  time.  In  addition,  BDDs  are  easy 
to  manipulate.  Any  Boolean  operation  /  ★  g  on  two  BDDs  /  and  g  can  be  carried  out  in 
0(\f\\g\).  The  size  of  a  BDD  can  depend  critically  on  the  variable  ordering.  To  find  an 

u 


(a)  (b) 

Figure  2.4:  Reductions  of  BDDs.  (a)  nodes  associated  to  the  same  variable  with 
equal  low  and  high  successors  will  be  converted  to  a  single  node,  (b)  nodes  causing 
redundant  tests  on  a  variable  are  eliminated. 
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optimal  ordering  is  a  co-NP-complete  problem  in  itself  [26],  but  as  illustrated  in  Exam¬ 
ple  2.5,  a  good  heuristic  for  choosing  an  ordering  is  to  locate  variables  close  to  each  other 
if  knowledge  about  their  truth  assignment  removes  a  lot  of  uncertainty  about  the  truth  value 
of  the  Boolean  function.  [40]. 

Example  2.5  For  the  function  {x\  A  yi)  V  (x2  A  y2)  V  •  •  •  V  (xn  A  yn )  there  is  an  exponential 
difference  between  the  size  of  a  BDD  with  the  variable  ordering  xi,yi,  •  •  •  ,  xn.  yn  and  a 
BDD  with  the  variable  ordering  x\,  ■  ■  ■  ,  xn.  ijy,  •  •  • ,  yn.  For  the  latter  ordering,  the  lack  of 
information  about  the  assignment  of  the  y  variables  in  the  top  section  of  the  BDD,  where 
the  x  variables  are  tested,  causes  an  exponential  growth  of  the  graph  [119].  The  problem  is 
illustrated  for  n  =  3  in  Figure  2.5.  0 


Figure  2.5:  Two  BDDs  representing  the  function  (x\  A  y±)  V  (X2  A  t/2)  V  ( £3  A  t/3). 
The  BDD  in  (a)  only  grows  linearly  with  the  number  of  variables  in  the  expression, 
while  the  BDD  in  (b)  grows  exponentially. 


For  a  comprehensive  introduction  to  BDDs  and  branching  programs  in  general,  we  refer 
the  reader  to  Bryant’s  original  paper  [26]  and  the  books  [121,  168]. 

BDD  Packages 

A  BDD  package  is  a  collection  of  efficient  data  structures  and  algorithms  for  representing 
and  computing  basic  operations  on  BDDs.  Modern  BDD  packages  (e.g.,[158,  112])  typi¬ 
cally  share  the  following  common  implementation  features  based  on  [25,  146]:  1)  a  single 
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shared  BDD  with  several  roots  representing  a  set  of  BDDs,  2)  a  set  of  dynamic  program¬ 
ming  algorithms  for  carrying  out  operations  on  the  BDDs  that  due  to  a  large  number  of 
distinct  subproblems  use  a  cache  instead  of  a  memoization  table,  and  3)  data  structures  that 
facilitate  dynamic  variable  reordering  and  garbage  collection  of  unreferenced  BDD  nodes 
that  is  invoked  when  the  percentage  of  unreferenced  BDD  nodes  exceeds  a  preset  threshold. 
The  three  major  parameters  are:  the  initial  number  of  nodes  allocated  to  the  shared  BDD, 
the  cache  size,  and  the  type  of  dynamic  variable  reordering,  if  any. 


2.3  Symbolic  Model  Checking 

Model  checking  (e.g.,  [40])  is  a  subfield  of  computer  science  that  applies  efficient  search 
procedures  to  determine  if  a  finite  transition  system  fulfills  its  specification.  In  other  words, 
the  transition  system  is  checked  to  see  whether  it  is  a  model  of  its  specification.  Given  a 
Kripke  structure  K,  representing  the  system,  a  specification  is  a  CTL  formula  that  must  hold 
in  the  initial  state  s0  of  the  system.  For  our  purpose,  it  is  sufficient  only  to  consider  invariant 
specifications  K.,  s0  1=  AG  I  where  all  states  reachable  from  s0  must  be  within  the  set  I.  An 
invariant  specification  is  verified  by  performing  a  reachability  analysis  to  find  the  set  of 
states  R  reachable  from  s 0.  It  is  then  checked  whether  R  is  a  subset  of  I.  Obviously,  an 
exhaustive  explicit  exploration  faces  the  state  space  explosion  problem.  In  1987  McMillan 
suggested  to  use  BDDs  to  search  the  state  space  implicitly  to  address  this  problem.  He 
coined  the  technique  symbolic  model  checking  [119]. 

The  basic  idea  in  symbolic  model  checking  is  to  use  a  BDD  to  represent  the  character¬ 
istic  function  of  a  set  of  states  and  the  transition  relation.  Given  a  set  A,  its  characteristic 
function  A(x)  =  x  G  A  is  a  Boolean  function  identifying  all  elements  of  A.  This  is  an  im¬ 
plicit  representation  of  A  since  the  size  of  the  BDD  representing  the  characteristic  function 
of  A  does  not  necessarily  grow  linearly  with  the  cardinality  of  A.  Due  to  the  isomorphism 
between  set  algebra  and  Boolean  algebra  union,  intersection,  and  complement  of  sets  cor¬ 
respond  to  disjunction,  conjunction  and  negation  of  their  characteristic  functions.  In  the 
sequel,  we  will  not  distinguish  between  set  operations  and  their  corresponding  Boolean 
operations.  Given  a  Kripke  structure  K  =  {S,  R ),  we  can  use  a  vector  of  Boolean  state 
variables  v  =  (vi,  •  •  • ,  Vfiog(|s|)i)  t0  rcPrcscnl  a  state.  Any  subset  of  states  Q  can  be  rep¬ 
resented  by  a  Boolean  function  Q(v)  that  can  be  encoded  as  a  BDD.  Similarly,  a  Boolean 
function  R(v,  v1),  where  unprimed  and  primed  variables  denote  current  and  next  states,  can 
be  used  to  represent  the  characteristic  function  of  the  transition  relation. 

Example  2.6  The  Kripke  structure  in  Figure  2.1  can  be  represented  with  two  state  variables 
v  =  (vi,vf)  such  that  A  =  (false,  false),  B  =  (true,  false),  C  =  (false,  true),  and 
D  =  (true,  true).  The  characteristic  function  of  the  transition  relation  is  then 


2.3.  S  YMBOLIC  MODEL  CHECKING 


17 


- i»i  A  — >v2 

A 

v[  A  -1^2 

V 

Vi  A  —>v2 

A 

v[  A  v'2 

V 

->Vi  A  v2 

A 

->v[  A  -fv'2 

V 

~*Vi  A  v2 

A 

v[  A  v'2 

V 

Vy  A  v2 

A 

< 

r 

The  crucial  idea  in  symbolic  model  checking  is  to  compute  previous  and  next  states  via 
BDD  operations.  The  next  states  of  a  set  of  states  C,  can  be  found  by  computing  the  image 
of  C 

Img(C)  =  (3v.  C(v)  AR&v'l'jlv'/vj.  (2.7) 

Existential  quantification  is  used  to  abstract  the  source  state.  The  input  to  Img(C)  is  the 
characteristic  function  of  C  in  current  state  variables  C(v).  The  output  is  the  characteristic 
function  in  current  state  variables  of  the  states  that  can  be  reached  by  a  single  transition 
from  C.  Notice  that  such  states  may  lie  within  C .  The  output  is  given  in  current  state 
variables  in  order  to  use  it  as  input  for  subsequent  image  computations.  The  reachable 
states  from  C  can  be  computed  by  composing  images  from  C  until  a  fixed  point  is  found. 
All  Boolean  functions  in  the  image  computation  are  represented  by  BDDs,  and  all  Boolean 
operations  are  carried  out  directly  on  these  BDDs.  In  the  sequel,  we  will  not  distinguish 
between  Boolean  operations  and  their  corresponding  BDD  operations. 

Example  2.7  For  the  state  C  in  the  Kripke  structure  shown  in  Figure  2.1,  we  have  C(v)  = 
A  v2.  The  image  of  C  is  given  by 

Img(C')  =  ^3F .  (-i^i  A  v2)  A  R(v,  v  ')^  [v  '/v\ 

=  ^3F .  -iiq  A  v2  A  -iv[  A  ->v'2  V  -irq  A  v2  A  v[  A  v'2  j  [v'/v\ 

=  A  -iv'2  V  v[  A  [v  '/v\ 

=  -i^i  A  -i^2  V  V\  A  v2. 

Thus,  as  expected,  we  get  Img(C')  =  {A,  D}.  0 

Previous  states  of  a  set  of  states  C  can  be  found  by  a  similar  computation  called  the  preim¬ 
age  of  C 

PreImg(C)  =  3v' .  R(v,v')  A  C(v)[v/v'].  (2.8) 

Again,  the  input  is  the  characteristic  function  of  C  in  current  state  variables.  The  output  is 
the  characteristic  function  (in  current  state  variables)  of  the  states  from  which  a  state  in  C 
can  be  reached  by  a  single  transition. 
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2.3.1  Partitioning 

A  common  problem  when  computing  the  image  and  preimage  is  that  the  intermediate 
BDDs  tend  to  be  large  compared  to  the  BDD  representing  the  result.  Another  problem  is 
that  the  transition  relation  may  grow  very  large  if  represented  by  a  single  BDD  (a  monolithic 
transition  relation).  In  symbolic  model  checking,  one  of  the  most  successful  approaches  to 
solve  this  problem  is  transition  relation  partitioning  [28].  The  technique  relies  on  the  ob¬ 
servation  that  a  system  often  can  be  characterized  as  either  asynchronous  with  interleaved 
activity  or  synchronous  with  simultaneous  activity.  Consider  the  system  model  shown  in 
Figure  2.6.  For  each  computation,  the  state  variables  vu  ■  ■  • ,  vn  are  updated.  Assume  that 
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vi 
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• 

X  \  *  /  \ 

• 
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State  i  State  i  +  1 

Figure  2.6:  System  model. 

activity  i  computes  the  next  value  of  the  state  variables  in  y%  given  the  current  value  of  the 
state  variables  in  T)  and  is  characterized  by  the  transition  relation  y ').  If  the  system 

is  asynchronous,  only  a  single  subsystem  is  active  in  a  computation  step  and  only  the  next 
state  variables  of  this  subsystem  change  value.  Otherwise,  if  the  system  is  synchronous, 
each  subsystem  is  active  in  a  computation  step  and  calculates  a  new  value  of  its  associated 
state  variables.  In  this  case,  we  must  assume  that  the  sets  of  variables  changed  by  the  sub¬ 
systems  form  a  partitioning  of  the  state  variables.  Let  Z  denote  the  state  variables  in  vector 
z.  In  the  asynchronous  case,  the  total  transition  relation  is 

n 

R(v,  v')  =  \J  (^Ri(xi,  y[)  A  f\ (V  <£>  t’))  (2.9) 

i=  1  v0Ti 

while  in  the  synchronous  case,  it  is 

n 

R(v,v')  =  A^04£').  (2.10) 

i— 1 

Thus,  the  transition  relation  can  either  be  represented  as  a  disjunctive  partitioning  or  a  con¬ 
junctive  partitioning  of  subrelations.  The  main  point  about  partitioning  is  that  the  complete 
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transition  relation  never  needs  to  be  computed  since  both  the  image  and  preimage  com¬ 
putations  can  be  carried  out  directly  on  the  subrelations.  In  the  asynchronous  case,  we 
get 

n 

img  (C)  =  \J  ^.ctf)  a R^, $[))[$[/&].  (2.ii) 

i—  1 

We  exploit  that  all  variables  except  the  ones  modified  by  the  active  sybsystem  are  un¬ 
changed.  Thus,  no  quantification  over  these  variables  is  necessary.  This  often  has  a  substan¬ 
tial  positive  effect  on  the  complexity  of  the  computation.  The  reason  is  that  the  complexity 
of  quantification  on  BDDs  may  be  exponential  in  the  number  of  quantified  variables.3  In 
practice,  it  is  often  an  advantage  to  merge  some  of  the  subrelations  [143]  and  combine 
quantification  and  disjunction  to  a  single  BDD  operation  (e.g.,[l  12]).  A  similar  approach 
can  be  used  to  simplify  the  preimage  computation 

n 

PRElMG  (C)  =  \j3y'i.Ri(xi,y'i)ACl<v)[yi/y,i\.  (2.12) 

i— 1 

The  conjunctive  case  is  more  complicated  due  to  the  fact  that  existential  quantification  does 
not  distribute  over  conjunction.  However,  the  subrelations  can  be  moved  out  of  scope  of 
existential  quantification  if  they  do  not  depend  on  any  of  the  variables  being  quantified. 
This  technique  is  often  referred  to  as  early  quantification.  For  the  image  computation,  we 
get 

IMG(C)  =  (3zn  .(■■■  (3ij  .  C(v)  A  y  i))  •  •  •)  A  Rn(xn,  y'S)  [v'/v\  (2.13) 

where  Zj  n  U"=j+i  =  0  for  1  <  j  <  n  and  (J"=1  =  {v i,  •  -  - ,  vn}- 

Similarly,  for  the  preimage  computation,  we  have 

PRElMG(C)  =  3zln.Rn(xn:y,n)A(--fi3z,1.R1(x1:y,1)AC(v)[v/v'])---)  (2.14) 

where  Z )  D  [JUj+i  Yl  =  0  for  1  <  j  <  n  and \Ji=l  Z[  =  {v'u  •  •  • ,  <}. 

A  large  number  of  heuristics  have  been  developed  for  choosing  and  arranging  partitions 
in  the  conjunctive  case  (e.g.,  [143,  120]).  The  main  idea  is  to  avoid  a  blow  up  of  the 
intermediate  BDDs  of  the  image  and  preimage  computation  by  reducing  the  life  span  of 
variables.  Assume  that  a  variable  is  introduced  in  the  computation  by  partition  i  and  that 
the  variable  is  removed  again  by  the  existential  quantification  associated  with  partition  j. 
The  life  span  of  the  variable  is  then  j  —  i.  Another  approach  to  reduce  the  complexity  of  the 

3From  an  artificial  intelligence  point  of  view,  this  simplified  image  computation  is  an  efficient  solution  to 
the  frame  problem  of  asynchronous  systems. 
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image  and  preimage  computation  is  to  compress  the  transition  relation  using  an  approach 
called  iterative  squaring  [27].  The  idea  is  to  incrementally  compute  the  closure  of  the 
transition  relation.  This  computation,  however,  is  often  very  complex.  It  is  normally  only 
an  advantage  if  the  domain  has  very  high  sequential  depth  (e.g.,  due  to  counters  [61]). 

2.3.2  Frontier  Set  Simplification 

Partitioning  is  often  combined  with  frontier  set  simplification  [41].  The  purpose  of  frontier 
set  simplification  is  to  reduce  the  size  of  the  BDDs  representing  the  frontier  of  bacward  or 
forward  search  based  on  the  image  or  preimage  computation.  Consider  computing  the  set 
of  states  reachable  from  C  using  the  image  computation.  The  set  of  states  R\  that  can  be 
reached  in  one  step  or  less  is 

Ri  =  Img(C)  U  C. 

Similarly,  the  set  of  states  R2  that  can  be  reached  in  two  steps  or  less  is 

R2  =  Img(-Ri)  U  Ri. 

This  computation  can  be  simplified  by  only  computing  next  states  from  the  frontier  of  the 
search 

R2  =  lMG(i?i  \  C)  U  i?i- 

The  set  F  =  R\  \  C  may  have  a  large  BDD  representation.  However,  we  can  choose 
it  anywhere  in  the  range  R\  \  C  C  F  C  R1  to  obtain  a  small  BDD.  The  research  on 
frontier  set  simplification  has  developed  several  heuristic  BDD  operations  for  finding  a 
good  candidate  for  F. 

2.3.3  Splitting 

A  more  direct  approach  for  computing  the  image  and  preimage  of  a  set  of  states  is  to 
use  a  transition  function.  Consider  for  example  a  transition  system  with  n  state  variables 
V\ ,  ■  ■  ■ ,  vn  and  the  transition  function  /  :  Bn  — Bn  given  by 

vn  =  fn(v  1,  ■■■,«„). 

The  image  of  a  state  s  E  Bn  is  the  mathematical  image  f(s)  of  s.  To  find  the  image  f(S) 
of  all  states  S  in  the  domain  (the  unrestricted  image)  a  technique  called  input  splitting 
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can  be  employed  (e.g.,  [121]).  It  is  based  on  the  observation  that  the  unrestricted  image 
computation  can  be  decomposed  with  respect  to  the  input  variables 

/(‘S')  f\vi<—trUe  V  f\vi<— false- 

The  decomposition  is  carried  out  recursively  until  each  restricted  /  function  has  constant 
elements.  The  characteristic  function  of  the  image  (in  next  state  variables)  is  given  by  the 
resulting  expression.  The  approach  can  be  extended  to  an  arbitrary  set  of  states  and  is  par¬ 
ticularly  efficient  for  problems  where  it  is  impossible  to  arrange  a  conjunctive  partitioning 
to  allow  an  efficient  early  quantification  [123]. 

2.3.4  BDD  Package  Adjustment 

An  invariant  specification  is  checked  by  performing  a  sequence  of  image  computations 
until  the  set  of  covered  states  reaches  fixed  point.  Experimental  studies  indicate  that  the 
BDD  package  parameters  should  be  adjusted  differently  for  BDD-based  model  checking 
compared  to  circuit  verification  where  a  BDD  representing  a  digital  circuit  is  compared  to 
a  BDD  representing  its  specification  [177].  The  experiments  show  that  model  checking 
computations  have  a  large  number  of  repeated  subproblems  across  the  top  level  operations. 
Thus,  a  large  cache  size  is  more  important  for  model  checking  than  for  circuit  verifica¬ 
tion.  Furthermore,  model  checking  computations  can  have  a  very  high  death  and  rebirth 
rate  (unreferenced  nodes  being  referenced  again)  compared  to  circuit  computations.  Thus, 
garbage  collection  should  occur  less  frequently,  which  for  example  can  be  accomplished  by 
initially  allocating  a  large  number  of  nodes  for  the  shared  BDD.  Finally,  dynamic  reorder¬ 
ing  of  variables  is  efficient  given  an  initial  bad  variable  ordering,  but  given  a  good  initial 
variable  ordering  the  time  spend  on  reordering  does  not  pay  off. 


2.4  Heuristic  Search 

A  classical  search  problem  is  similar  to  a  deterministic  planning  problem  defined  in  Defi¬ 
nition  3.2.  The  only  difference  is  that  actions  are  associated  with  a  positive  cost.4  Fet  the 
function  c  :  Act  — >-  R+  define  action  costs.  A  solution  to  a  classical  search  problem  is  a 
deterministic  plan  7r  =  Gq  •  •  •  an.  The  length  of  7 r  is  n.  The  cost  of  t r  is 

n 

Cost  (tt)  =  ^c(cq).  (2.15) 

i— 1 

4The  cost  of  an  action  must  be  positive  since  we  require  that  infinite  paths  have  unbounded  total  cost. 
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Definition  2.7  (Optimal  Search  Problem  Solution)  A  search  problem  solution  tt  to  search 
problem  is  optimal  if  it  has  minimum  cost. 

Example  2.8  The  deterministic  planning  problem  shown  in  Figure  3.1  can  be  extended  to 
a  search  problem  by  adding  action  costs  as  shown  in  Figure  2.7.  We  have 

5  =  {A,B,C,D}, 

Act  =  {a,P,  7}, 

=  {(A,P,B),(B,'y,D),(C,a,A),(C,P,D),(D,a,C)}, 

so  =  C, 

G  =  {B}, 

c  =  {a4l.0,^4l.0,7  4  2.0}. 

An  optimal  solution  is  a/3  with  a  length  and  cost  of  2.  0 


G 


Figure  2.7:  A  search  problem  derived  from  the  deterministic  planning  problem 
discussed  in  Example  3.1.  The  /i-values  associated  with  each  state  defines  a  heuristic 
function  used  in  Example  2.9. 


Classical  search  algorithms  like  A*  and  pure  heuristic  search  are  characterized  by  build¬ 
ing  a  search  tree  superimposed  over  the  state  space  during  the  search  process.  Each  search 
node  in  the  tree  is  a  pair  (s,  I )  where  s  is  a  single  state  and  I  e  Rfi  is  a  d-dimensional  vec¬ 
tor  of  real  numbers  representing  information  associated  with  the  node  to  guide  the  search. 
The  root  of  the  search  tree  contains  the  initial  state.  We  will  assume  that  the  initial  state 
belongs  to  a  search  node  with  node  information  I0.  The  leaf  nodes  of  the  tree  correspond 
to  states  that  do  not  have  successors  in  the  tree,  either  because  they  have  not  been  expanded 
yet,  or  because  they  were  expanded,  but  had  no  children.  In  each  step,  the  search  algorithm 
chooses  one  leaf  node  to  expand.  The  collection  of  unexpanded  nodes  is  called  the  fringe 
ox  frontier  of  the  search  tree.  It  is  important  to  distinguish  between  the  search  domain  and 
the  search  tree.  For  finite  but  cyclic  search  domains,  the  search  tree  may  be  infinite.  Best- 
First  Search  (BFS)  describes  a  collection  of  search  algorithms,  each  of  which  has  a  cost 
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associated  with  each  node  and  at  each  node  expansion  cycle  chooses  a  node  of  lowest  cost 
to  expand  next.  Depending  on  the  particular  cost  function  chosen,  we  get  different  BFS  al¬ 
gorithms.  To  lower  the  complexity  of  the  node  selection,  the  frontier  is  often  implemented 
as  a  priority  queue  on  the  node  costs.  Figure  2.8  shows  the  BFS  algorithm.  The  solution 
extraction  function  in  line  5  simply  obtains  a  solution  by  tracing  back  the  actions  from  the 
goal  node  to  the  root  node.  The  Expand  function  in  line  6  finds  the  set  of  child  nodes  of  a 
single  node,  and  EnqueueAll  inserts  each  child  in  the  frontier  queue. 

function  BFS  (s0,  Io,  G) 

1  frontier  <—  MakeQueue((s0,  Jo)) 

2  loop 

3  if  \frontier\  =  0  then  return  “no  solution  exists” 

4  (8,1)  <-  RemoveTop  (frontier) 

5  if  s  G  G  then  return  ExtractSolution (frontier,  (s,  /)) 

6  frontier  <-  EnqueueAll  (frontier,  Expand((s,  /})) 


Figure  2.8:  The  Best-First  Search  (BFS)  algorithm. 


The  A*  algorithm  is  probably  the  best  known  and  most  well  studied  of  the  BFS  algo¬ 
rithms.  A*  sorts  the  unexpanded  nodes  in  the  priority  queue  in  ascending  order  given  by  a 
heuristic  evaluation  function  f.  The  evaluation  function  is  defined  by 

f(n)  =  g(n)  +  h(ri)  (2.16) 

where  g(n)  is  the  cost  of  the  path  in  the  search  tree  leading  from  the  root  node  to  n,  and 
h(n)  is  a  heuristic  function  estimating  the  cost  of  a  minimum  cost  path  leading  from  the 
state  in  n  to  some  goal  state.  Thus  f(n)  measures  the  minimum  cost  over  all  solution  paths 
constrained  to  go  through  the  state  in  n. 

Example  2.9  The  search  tree  built  by  A*  for  the  problem  introduced  in  Example  2.8  and 
the  heuristic  function  defined  in  Figure  2.7  is  shown  in  Figure  2.9.  0 

The  properties  of  A*  have  been  surveyed  by  Pearl  [130].  A*  is  sound  and  complete , 
since  the  node  expansion  operation  is  assumed  to  be  correct,  and  infinite  cyclic  paths  have 
unbounded  cost.  A*  further  finds  optimal  solutions  if  the  heuristic  function  h(n)  is  ad¬ 
missible,  that  is,  h(n)  is  a  lower  bound  estimate  such  that  h(n)  <  h*  (n)  for  all  n,  where 
h*(n)  is  the  minimum  cost  of  a  path  going  from  the  state  in  n  to  a  goal  state.  A*  is  op¬ 
timally  efficient  for  any  admissible  heuristic  function.  That  is,  no  other  optimal  algorithm 
is  guaranteed  to  expand  fewer  nodes  than  A*  [44] .  It  can  be  shown  that  every  node  on  the 
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Figure  2.9:  Search  tree  example. 

frontier  with  f(n)  <  C*,  where  C*  is  the  optimal  cost,  eventually  will  be  expanded  by 
A*.  Thus,  the  complexity  of  A*  is  directly  tied  to  the  accuracy  of  the  estimates  provided 
by  h(n).  When  A*  employs  a  perfectly  informed  heuristic  (. h(n )  =  h*(n)),  it  is  guided 
directly  toward  the  closest  goal.  At  the  other  extreme,  when  no  heuristic  at  all  is  available 
(h(ri)  =  0),  the  search  becomes  exhaustive,  normally  yielding  exponential  complexity.  In 
general,  A*  has  linear  complexity  if  the  absolute  error  of  the  heuristic  function  is  constant, 
but  it  may  have  exponential  complexity  if  the  relative  error  is  constant.  Subexponential 
complexity  requires  that  the  growth  rate  of  the  error  is  logarithmically  bounded  [147] 

| h(n)  —  h*(n) j  G  O(\ogh*(n)). 

The  complexity  results  are  disappointing  due  to  the  fact  that  practical  heuristic  functions 
often  are  based  on  a  relaxation  of  the  search  problem  that  causes  h(n)  to  have  constant  or 
near  constant  relative  error.  The  results  show  that  practical  application  of  A*  still  may  be 
very  search  intensive.  Often  better  performance  of  A*  can  be  obtained  by  weighting  the  g 
and  /^-component  of  the  evaluation  function  [140] 

f{n)  =  (1  —  w)g(n )  +  whin),  where  w  G  [0, 1].  (2.17) 

Weighted  A*  can  be  used  to  implement  a  wide  range  of  BFS  algorithms.  Weights  w  = 
0.0,  0.5,  and  1.0  correspond  to  uniform  cost  search  (Dijkstra’s  algorithm),  A*,  and  pure 
heuristic  search,  respectively.  Weighted  A*  is  optimal  in  the  range  [0.0, 0.5]  but  often  finds 
solutions  faster  in  the  range  (0.5, 1]. 

Another  drawback  of  A*  is  that  its  space  complexity  is  very  high  due  to  the  explicit 
representation  of  the  search  tree.  For  that  reason  an  iterative  deepening  version  of  A* 
called  IDA*  has  been  developed  [107].  This  algorithm  has  space  complexity  linear  with 
the  search  depth.  However,  unless  the  search  domain  is  a  tree,  it  may  perform  a  highly 
redundant  search. 
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2.5  Summary 

This  chapter  has  described  Quantified  Boolean  Formulas  (QBF)  as  a  concise  logic  for  rep¬ 
resenting  the  complex  Boolean  operations  involved  in  BDD-based  planning.  To  specify 
non-deterministic  plans,  we  have  presented  Kripke  structures  and  the  Computation  Tree 
Logic  (CTL).  We  then  spend  two  sections  describing  the  key  features  of  the  Binary  De¬ 
cision  Diagram  (BDD)  and  the  techniques  developed  in  model  checking  to  represent  and 
search  a  state  space  efficiently  with  BDDs.  Finally,  we  have  described  classical  heuristic 
search  algorithms  and  heuristic  search  techniques. 
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Chapter  3 

BDD-Based  Planning 


In  this  chapter,  we  describe  several  basic  encoding  and  search  techniques  for  deterministic 
and  non-deterministic  BDD-based  planning.  Section  3.1  defines  deterministic  planning  and 
presents  three  principles  for  encoding  STRIPS  planning  problems  with  BDDs.  Moreover,  it 
describes  a  general  BDD-based  bidirectional  breadth-first  search  algorithm  for  generating 
deterministic  plans.  Section  3.2  introduces  the  definition  of  non-deterministic  planning 
used  in  the  thesis  and  shows  how  to  encode  NADL  and  NADL+  planning  problems  with 
BDDs.  Finally,  it  introduces  a  general  BDD-based  backward  breadth-first  search  algorithm 
for  generating  weak,  strong  cyclic,  and  strong  non-deterministic  plans. 


3.1  Deterministic  Planning 

Classical  Al-planning  considers  domains  with  a  finite  set  of  states  and  a  finite  set  of  deter¬ 
ministic  actions. 

Definition  3.1  (Deterministic  Planning  Domain)  A  deterministic  planning  domain  is  a 
tuple  (S,  Act,  — »)  where  S  is  a  finite  set  of  states,  Act  is  a  finite  set  of  actions,  and  —t  C 
S  x  Act  x  S  is  a  deterministic  transition  relation  of  action  effects.  Instead  of(s,  a,  s')  G  — », 
we  write  s  A  s'. 

The  transition  relation  — >  is  deterministic  if  actions  can  lead  to  at  most  one  possible  next 
state.  That  is 

s  — y  p  A  s  — y  q  =>-  p  =  q  . 

An  action  a  is  applicable  in  a  state  s  iff  s  A  s'  for  some  state  s'.  A  deterministic  planning 
problem  is  given  by  a  single  initial  state  and  a  set  of  goal  states. 
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Definition  3.2  (Deterministic  Planning  Problem)  A  deterministic  planning  problem  is  a 
tuple  D ,  so,  G )  where  V  is  a  deterministic  planning  domain,  s0  G  S  is  an  initial  state,  and 
G  C  S  is  a  set  of  goal  states. 

A  solution  to  (or  plan  for)  a  deterministic  planning  problem  is  a  sequence  of  actions  form¬ 
ing  a  path  from  the  initial  state  to  a  goal  state. 

Definition  3.3  (Deterministic  Plan)  Let  V  be  a  deterministic  planning  problem.  A  solu¬ 
tion  or  plan  for  V  is  a  sequence  of  actions  ir  =  cq  ■  ■  •  an  such  that  there  exists  a  path 
q0  •  •  •  qn  where  q0  =  So>  Qn  £  G,  and  for  0  <  i  <  n,  we  have  q,_y  %  qt. 

The  length  of  a  plan  cq  •  •  ■  an  is  n.  A  plan  is  optimal  if  it  has  minimum  length. 

Example  3.1  The  deterministic  planning  problem  shown  in  Figure  3.1  has 

5  =  {A,  B,  C,  D}, 

Act  =  {a,  f3, 7}, 

=  {(A,f3,B),(B,1,D),(C,a,A),(C,(3,D),(D,a,C)}, 
s0  =  C, 

G  =  {B}. 

An  optimal  plan  solving  the  problem  is  a/3  and  has  length  2.  0 


A 

I 

I 

1  a 


so 


p 


G 


7 


Figure  3.1:  A  deterministic  planning  problem  with  four  states  A.  B.  C,  and  D  and 
three  actions  a  (dashed),  (3  (solid),  and  7  (dotted).  The  initial  state  is  C  and  the  set  of 
goal  states  is  a  singleton  set  {-B}. 


3.1.1  Encoding  STRIPS  Domains 

Classical  deterministic  planning  problems  are  often  written  in  planning  languages  such  as 
STRIPS  [58],  ADL  [132],  and  PDDL  [118].  In  these  languages,  states  are  represented  by 
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conjunctions  of  function-free  ground  predicates.1  A  STRIPS  planning  domain  is  a  pair 
V  =  (P,  A)  where  P  is  a  set  of  predicates  and  A  is  a  set  of  action  schemas.  Each  action 
schema  is  a  tuple  (par,  pre ,  add ,  del)  where  par  is  a  set  of  parameter  variables,  and  pre, 
add,  and  del  are  sets  of  predicates  from  P  where  the  only  free  variables  are  parameter 
variables  in  par. 

Example  3.2  The  Gripper  domain  used  in  the  AIPS-98  planning  competition  [113]  is 
shown  in  Figure  3.2.  In  this  domain  there  are  three  actions  schemas  and  7  predicates  of 
which  three  are  used  to  indicate  the  type  of  objects.  <0 


Predicates 

room(R) 

ball(B) 

gripper)  G) 

at-robby(R) 

at(B.R) 

free(G) 

carry(O.G) 

Actions 

Move 

par:  FROM,  TO 

pre:  room(FROM),  room(TO),  atRobby(FROM) 
add:  atRobby(TO) 
del:  atRobby(FROM) 

Pick 

par:  OBJ,  ROOM,  GRIPPER 

pre:  ball(OBJ),  room(ROOM),  gripper(GRlPPER),  at(OBJ,  ROOM),  atRobby(ROOM),  free(GRIPPER) 

add:  carty(OBJ,  GRIPPER ) 

del:  alt  OBJ,  ROOM),  free)  GRIPPER ) 

Drop 

par:  OBJ.  ROOM,  GRIPPER 

pre:  ball(OBJ),  room(ROOM),  gripper(GRlPPER),  carry) OBJ,  GRIPPER),  atRobby(ROOM) 
add:  at) OBJ,  ROOM),  free) GRIPPER) 
del:  carry) OBJ,  GRIPPER) 

Figure  3.2:  The  Gripper  domain  of  the  AIPS-98  planning  competition.  A  robot 
called  Robby  has  grippers  to  move  objects  between  rooms.  The  Move  action  moves 
Robby  between  rooms,  while  the  Pick  and  Drop  actions  load  and  unload  objects  into 
a  gripper. 


A  STRIPS  planning  problem  is  a  tuple  (' D ,  O ,  /,  G)  where  V  is  a  STRIPS  planning  do¬ 
main,  O  is  a  set  of  constant  terms  forming  the  objects  of  the  problem,  I  is  a  set  of  ground 

'More  elaborate  state  representations  exist,  but  representing  states  as  sets  of  ground  predicates  is  the  main 
idea. 
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predicates  that  are  true  in  the  initial  state  (all  other  ground  predicates  are  assumed  to  be 
false  initially),  and  G  is  a  set  of  ground  predicates  that  must  be  true  in  a  goal  state  (all  other 
ground  predicates  can  have  arbitrary  truth  values).  The  actions  of  the  problem  are  gener¬ 
ated  from  the  action  schemas  by  instantiating  the  parameters  with  objects  of  the  problem. 
In  a  given  state  S,  an  action  { pre ,  add ,  del)  is  applicable  if  pre  C  S,  and  the  resulting  state 
is  S'  =  (S  U  add )  \  del. 

Example  3.3  The  objects  and  initial  state  of  the  Gripper  planning  problem  shown  in  Fig¬ 
ure  3.3  specifies  that  there  are  two  rooms  (rooma, roomb),  two  grippers  (left, right),  and  that 
the  movable  object  is  a  ball  (balll).  Initially,  both  Robby  and  the  ball  is  in  room  A  and  the 
goal  is  to  move  the  ball  to  room  B.  0 

Objects 

rooma,  roomb,  balll ,  left,  right 

Initial 

room(rooma),  room(roomb),  ball(balll),  atRobby(rooma),  free(left),  free(right)  at( balll,  rooma) 

gripper(left),  gripper ( right) 

Goal 

at( balll,  roomb) 

Figure  3.3:  A  Gripper  planning  problem,  with  two  rooms,  one  ball  object  to  move, 
and  two  grippers  on  Robby.  Initially,  both  Robby  and  the  ball  are  in  room  A,  and  the 
goal  is  to  move  the  ball  to  room  B. 


For  classical  deterministic  planning  problems  described  in  a  STRIPS  like  language, 
BDD-based  deterministic  planning  involves  two  orthogonal  problems.  The  first  is  to  rep¬ 
resent  the  planning  domain  compactly.  The  second  is  to  compute  a  BDD  representation  of 
the  transition  relation  and  perform  a  BDD-based  exploration  of  the  state  space. 

Representing  STRIPS  Domains  Compactly 

Consider  the  STRIPS  description  of  the  Gripper  problem  in  Example  3.3.  A  simple  way 
to  represent  the  transition  relation  of  this  problem  with  a  BDD  is  to  ground  all  predicates 
and  action  schemas  and  use  a  Boolean  state  variable  to  represent  each  ground  predicate. 
However,  this  often  results  in  a  very  redundant  encoding  that  is  impossible  to  handled 
efficiently  with  BDDs.  In  order  to  encode  STRIPS  domains  efficiently,  we  follow  mainly 
[50]  and  use  three  principles  to  compress  the  domain  description. 

Principle  1.  The  first  principle  is  to  remove  predicates  from  the  domain  description  that 
do  not  change  their  truth-value.  These  predicates  are  called  static  predicates.  They  are 
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typically  used  to  represent  typing  information  like  the  static  predicates  room(R),  ball(B), 
and  gripper  (G)  in  the  Gripper  problem.  It  is  simple  to  identify  static  predicates  by  check¬ 
ing  that  they  are  not  included  in  a  delete  or  add  set  of  any  action.  Moreover,  since  we 
already  now  their  truth  value  by  inspecting  the  initial  state,  they  can  be  abstracted  away  in 
a  compressed  encoding  of  the  transition  relation. 

Principle  2.  The  second  principle  is  to  find  the  domain  of  predicate  arguments  and  action 
schema  parameters  which  is  often  a  small  subset  of  the  total  set  of  objects  O.  The  static 
predicates  may  provide  information  to  restrict  these  domains,  but  there  can  be  constraints 
on  the  domains  that  only  can  be  found  by  finding  the  set  of  reachable  states  from  the  initial 
state.  Computing  these  states  is  as  hard  as  solving  the  planning  problem  itself.  Instead,  the 
set  of  reachable  states  can  be  approximated  by  a  relaxed  reachability  analysis  where  the 
delete  set  of  actions  is  ignored.  This  estimate  will  always  include  the  reachable  states.  If 
implemented  carefully,  the  analysis  can  be  carried  out  in  a  small  fraction  of  the  time  needed 
to  solve  the  complete  planning  problem  [50]. 

Principle  3.  The  third  principle  is  to  use  numerical  state  variables  instead  of  predicates  to 
represent  locations  of  objects.  Predicates  often  encode  physical  locations  of  objects.  In  the 
Gripper  problem,  the  four  predicates  at  (ball  1 ,  rooma ),  at(balll,  roomb),  carry(balll ,  left), 
and  carry  (balll,  right )  encode  the  four  possible  locations  of  the  ball.  However,  since  the 
ball  at  most  can  be  at  one  location  at  a  time,  we  only  need  a  single  numerical  state  vari¬ 
able  represented  by  log(4)  =  2  bits  to  represent  the  truth  value  of  these  predicates.  Sets 
of  predicates  with  this  property  are  called  single-valued  [64]  or  balanced  [50].  Balanced 
predicates  can  be  found  automatically  by  generating  candidate  sets  of  predicates  and  prov¬ 
ing  by  induction  that  they  are  balanced.  The  base  case  of  this  proof  is  to  show  that  they  are 
balanced  in  the  initial  state.  The  inductive  step  is  to  show  that  each  action  preserves  their 
balance  [141]. 

When  given  a  planning  problem  defined  in  the  STRIPS  part  of  the  PDDL  language,  the 
BIFROST  search  engine  described  in  Appendix  A  can  perform  these  three  analysis  steps 
automatically.  The  complete  analysis  can  normally  be  carried  out  in  a  small  fraction  of  the 
total  time  needed  to  solve  the  planning  problem. 

Encoding  STRIPS  Domains  with  BDDs 

In  order  to  compute  a  BDD  representation  of  the  transition  relation,  we  first  observe  that 
a  deterministic  planning  domain  is  an  asynchronous  system  in  the  sense  that  only  a  single 
action  is  active  in  each  step.  Thus,  as  described  in  Section  2.3.1,  a  disjunctive  partitioning 
of  the  transition  relation  can  be  used  to  lower  the  complexity  of  the  image  and  preimage 
computation.  For  each  action  i  in  the  compressed  encoding  of  the  domain,  we  get  a  subre- 
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lation 


Ri{xi,y'i)=  A  *  A  A  y'  A  A  'V-  (3-1) 

x£prei  y£addi  yEdeli 

To  conform  to  the  definition  of  the  transition  relation,  the  subrelation  should  contain  in¬ 
formation  about  which  action  the  transitions  are  associated  with.  However,  as  described 
below,  we  can  use  the  partitioning  itself  to  hold  this  information  when  extracting  the  ac¬ 
tions  of  a  solution.  Saving  variables  in  the  encoding  of  the  transition  relation  is  important 
for  keeping  the  complexity  of  the  image  and  preimage  computations  low. 

Due  to  the  large  number  of  small  BDDs  representing  the  action  relations,  it  is  almost  al¬ 
ways  an  advantage  to  combine  them  into  larger  partitions.  Care  must  be  taken  to  merge  the 
subrelations  such  that  partitions  that  only  modify  a  small  subset  of  variables  are  produced. 
It  is  hard  to  produce  optimal  solutions  to  this  problem.  However,  an  approximation  that 
works  satisfactory  in  practice  is  to  sort  the  subrelations  according  to  which  variables  they 
modify  and  merge  them  from  left  to  right  according  to  a  threshold  on  the  size  of  the  BDD 
representing  the  resulting  partition.  Typical  “good”  values  of  the  threshold  is  in  the  range 
5000  to  10000  BDD  nodes.  As  described  in  Section  2.2,  the  size  of  a  BDD  is  sensitive 
to  the  variable  ordering.  In  general,  related  variables  should  be  close  to  each  other  in  the 
ordering.  Since  the  current  and  next  state  variables  of  a  state  variable  almost  always  are 
highly  dependent,  it  is  often  beneficial  to  interleave  them  in  the  ordering.  However,  no  pre¬ 
vious  work  has  addressed  how  the  state  variables  of  planning  domains  should  be  ordered. 
Heuristics  like  fan  in  and  weight  for  constructing  good  variable  orders  of  combinational 
circuits  [121]  still  need  to  be  developed  for  BDD-based  planning.  In  practice,  the  natural 
ordering  of  state  variables  of  a  planning  domain  often  turns  out  to  be  efficient,  since  it  re¬ 
flects  the  semantics  of  the  variables.  Otherwise,  the  dynamic  re-ordering  techniques  of  the 
BDD  package  can  be  used  to  find  good  orderings. 

3.1.2  Planning  Algorithms 

Given  a  BDD  representation  of  the  transition  relation,  it  is  simple  to  use  the  image  and 
preimage  computation  to  implement  optimal  breadth-first  forward,  backward,  and  bidi¬ 
rectional  search.  The  forward  and  backward  search  algorithms  are  special  cases  of  the 
bidirectional  search  algorithm  shown  in  Figure  3.4.  In  each  iteration,  the  algorithm  either 
computes  the  frontier  states  in  forward  or  backward  direction.  The  set  reached  contains  all 
explored  states  and  is  used  to  prune  a  new  frontier  from  previously  visited  states.  If  the  set 
of  pruned  frontier  states  is  empty,  the  algorithm  returns  “no  solution  exists”.  If  an  overlap 
between  the  forward  and  backward  search  frontier  is  found,  the  algorithm  extracts  and  re¬ 
turns  a  solution.  Otherwise  the  search  continues.  A  good  heuristic  for  deciding  in  which 
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function  Bidirectional  Breadth-First  Search(s0,  G) 

1  reached  4-  0 

2  forwardFrontier 0  4-  {s0};  *  •<—  0 

3  backwardFrontier^  <—  G\  j  <—  0 

4  while  forwardFrontier i  fl  backwardFrontier j  =  0 

5  if  TlME^onwardFron^erJ  <  Time  (backwardFrontier j) 

6  i  4—  i  +  1 

7  forwardFrontier  i  4—  IMG  (forwardFrontier^)  \  reached 

8  reached  4—  reached  U  forwardFrontier i 

9  if  forwardFrontier i  =  0  return  “no  solution  exists” 

10  else 

11  j  <—  j  +  1 

12  backwardFrontier  j  4—  PRElMG  (backwardFrontier j_i)  \  reached 

13  reached  4—  reached  U  backwardFrontier  j 

14  if  backwardFrontier  j  =  0  return  “no  solution  exists” 

15  return  EXTRACTS OLUTION  (forwardFrontier,  backwardFrontier) 


Figure  3.4:  BDD-based  Bidirectional  Breadth-First  Search. 


direction  to  expand  the  search  is  simply  to  choose  the  direction  where  the  previous  frontier 
took  least  time  to  compute  [51].  When  using  this  heuristic,  bidirectional  search  has  similar 
or  better  performance  than  both  forward  and  backward  search,  since  it  will  adapt  to  one  of 
these  algorithms  if  the  frontiers  always  are  faster  to  compute  in  a  particular  direction.  The 
complexity  of  extracting  a  solution  is  normally  much  lower  than  the  complexity  of  comput¬ 
ing  the  sequence  of  expansions  of  the  search  frontier.  To  realize  this,  consider  having  found 
an  overlap  between  the  forward  and  backward  search  frontier.  For  each  state  s  in  the  over¬ 
lap,  there  exists  an  optimal  solution  passing  through  s.  Consequently,  we  can  pick  a  single 
state  in  the  overlap  and  trace  its  associated  optimal  solution.  To  find  the  part  of  the  solution 
from  s  to  a  goal  state,  images  of  s  are  intersected  with  the  backward  search  frontiers.  For 
each  of  these  computations,  the  image  only  needs  to  be  computed  for  a  single  state.  This 
can  be  done  very  fast  relative  to  the  time  needed  for  computing  images  during  search  since 
the  BDD  of  a  single  state  is  small.  When  performing  these  image  computations  a  version 
of  the  transition  relation  is  employed  where  no  subrelations  of  actions  have  been  merged. 
Since  each  subrelation  is  associated  with  a  particular  action,  this  transition  relation  can  be 
used  to  extract  the  actions  of  the  solution  path.  To  extract  the  part  of  the  plan  leading  from 
the  initial  state  to  s  a  similar  sequence  of  preimage  computations  is  carried  out. 

The  growth  rate  of  the  search  frontier  is  usually  depending  highly  on  the  search  direc- 
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tion.  For  most  problems,  the  backward  growth  rate  is  substantially  larger  than  the  forward. 
The  reason  for  this  is  that  the  states  reached  from  the  initial  state  always  are  legal  states  of 
the  system  modeled  by  the  domain,  while  the  states  reached  from  the  goal  states  may  be 
illegal  states  of  the  system.  Thus,  a  regular  structure  of  the  modeled  system  may  only  be 
reflected  in  the  forward  search  frontier.  The  difference  in  growth  rate  often  disappears  if 
the  set  of  goal  states  is  reduced  to  legal  system  states. 

A  major  problem  of  BDD-based  planning  is  a  high  growth  rate  of  BDDs  representing 
the  search  frontier  [86].  Frontier  set  simplification  can  be  used  to  address  this  problem 
(see  Section  2.3.1).  However,  the  technique  does  not  seem  work  well  on  typical  planning 
problems. 

Since  the  reachability  analysis  that  forms  the  core  of  symbolic  model  checking  resem¬ 
bles  the  state  space  search  performed  by  the  bidirectional  search  algorithm  shown  in  Fig¬ 
ure  3.4,  we  would  expect  that  the  BDD  package  parameters  should  be  adjusted  similarly  for 
symbolic  model  checking  and  planning.  No  systematic  experiments  have  been  carried  out 
to  confirm  this,  but  our  experiences  with  deterministic  planning  problems  fit  well  with  the 
hypothesis  that:  1)  a  planning  problem  initiated  with  a  good  variable  order  seems  always 
to  perform  better  without  dynamic  variable  reordering,  2)  each  garbage  collection  seems 
to  impair  performance  by  deleting  nodes  that  later  must  be  recomputed,  and  3)  a  too  little 
cache  can  cause  a  performance  degradation  of  several  factors  (a  cache  size  that  works  well 
in  practice  is  about  10  percent  of  the  total  number  of  allocated  nodes).  However,  there  also 
seems  to  be  significant  differences  between  typical  model  checking  problems  and  planning 
problems.  The  BDDs  representing  the  search  frontier  of  a  typical  planning  problem  often 
grow  very  fast  compared  to  the  BDDs  representing  the  search  frontier  of  a  typical  symbolic 
model  checking  problem  [119].  The  reason  seems  to  be  that  there  are  several  subtle  differ¬ 
ences  between  typical  verification  problems  and  planning  problems.  First  of  all,  planning 
problems  tend  to  be  combinatorially  hard  compared  to  formal  verification  problems.  Veri¬ 
fication  often  considers  digital  circuits  and  software  descriptions  that  are  large  compared  to 
the  logical  problem  they  contain.  Planning  problems,  on  the  other  hand,  are  normally  fairly 
dense  representations  of  a  combinatorial  problem.  Second,  the  graph  diameter  of  planning 
domains  is  often  larger  than  the  graph  diameter  of  the  domains  studied  in  formal  verifica¬ 
tion.  The  reason  is  that  planning  problems  normally  involve  sequencing  a  large  number  of 
actions,  while  symbolic  model  checking  problems  typically  consider  synchronous  systems 
where  a  global  state  change  can  happen  in  each  iteration. 
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3.2  Non-Deterministic  Planning 

A  deterministic  action  can  lead  to  at  most  one  possible  next  state.  A  more  general  model 
is  to  assume  that  the  outcome  actions  is  uncertain  such  that  actions  may  lead  to  one  of 
several  possible  next  states.  For  instance,  when  the  robot  in  the  Gripper  domain  picks  a 
ball,  it  may  be  that  it  either  succeeds  and  holds  the  ball  in  its  gripper  in  the  next  state, 
or  it  fails  and  ends  in  a  state  where  the  ball  still  is  on  the  floor  and  the  gripper  is  empty. 
Markov  decision  processes  (MDPs)  model  such  non-determinism  by  defining  the  effect  of 
actions  as  a  probability  distribution  over  the  state  space.  We  will  consider  a  simpler  model 
of  non-determinism  where  the  effect  of  an  action  is  defined  by  a  set  of  possible  next  states. 
Unless  an  alternative  interpretation  is  clear  from  the  context,  the  term  non-determinism  will 
be  used  to  refer  to  this  particular  model. 

Non-determinism  can  model  a  wide  range  of  dynamic  systems  [3].  Common  to  all  of 
them  is  that  an  active  environment  interact  with  the  actions.  The  environment  can  for  in¬ 
stance  cause  otherwise  deterministic  actions  to  fail,  or  it  can  control  a  subset  of  the  actions. 
In  the  latter  case  these  uncontrollable  actions  may  either  be  interleaved  or  simultaneous 
with  controllable  actions.  In  both  cases,  it  can  be  modeled  by  non-deterministic  control¬ 
lable  actions. 

A  non-deterministic  planning  domain  is  similar  to  a  deterministic  planning  domain  ex¬ 
cept  that  the  actions  may  be  non-deterministic 

Definition  3.4  (Non-Deterministic  Planning  Domain)  A  non-deterministic  planning  do¬ 
main  is  a  tuple  { S ,  Act ,  -A)  where  S  is  a  finite  set  of  states,  Act  is  a  finite  set  of  actions, 
and  — >■  C  S  x  Act  x  S  is  a  non-deterministic  transition  relation  of  action  effects.  Instead 
of  (s,  a,  s')  e  — >,  we  write  s  -A  s'. 

The  set  of  next  states  of  an  action  a  applied  in  state  s  is  given  by 

Next(s,  a)  =  {s'  :  s  -A  s'}.  (3.2) 

An  action  a  is  called  applicable  in  state  s  iff  Next(s,  a)  f  0.  The  set  of  applicable  actions 
in  a  state  s  is  given  by 

App(s)  =  {a  :  Next(s,  a)  f  0}.  (3.3) 

A  non-deterministic  planning  problem  is  similar  to  a  deterministic  planning  problem. 

Definition  3.5  (Non-Deterministic  Planning  Problem)  A  non-deterministic  planning  pro¬ 
blem  is  a  tuple  (V,  sq,  G)2  where  V  is  a  non-deterministic  planning  domain,  So  £  S  is  an 

2Several  of  the  non-deterministic  algorithms  introduced  in  the  thesis  can  handle  uncertainty  about  the 
initial  state  represented  by  a  set  of  initial  states  So  ■  However,  for  the  sake  of  simplicity  of  the  presentation, 
we  assume  the  initial  state  to  be  fully  known. 
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initial  state,  and  G  C  S  is  a  set  of  goal  states. 

A  non-deterministic  plan  could  be  a  sequence  of  actions  that  is  guaranteed  to  reach  a  goal 
state  regardless  of  the  non-determinism  of  the  domain.  That  is,  for  all  uncertain  action  ef¬ 
fects,  the  execution  of  the  plan  leads  to  a  goal  state.  Such  plans  are  called  conformant  plans 
[68,  35].  However,  since  conformant  plans  seldom  exist,  we  define  a  non-deterministic  plan 
to  be  a  set  of  state-action  pairs  defined  below. 

Definition  3.6  (State-action  pair  (SA))  Let  T>  be  a  non-deterministic  planning  domain.  A 
state-action  pair  (s.  a)  ofV  is  a  state  s  E  S  associated  with  an  applicable  action  a  € 
App(s). 

The  set  of  SAs  define  a  function  from  states  to  sets  of  actions  relevant  to  apply  in  order 
to  reach  a  goal  state.  This  definition  is  identical  to  the  state-action  table  definition  used  in 
[36,  37,  42,  34]  and  is  similar  to  universal  plans  [154],  policies  in  reinforcement  learning 
(e.g.,[122]),  and  strategies  in  concurrent  reachability  games  [43]. 

Definition  3.7  (Non-Deterministic  Plan)  Let  T>  be  a  non-deterministic  planning  domain. 
A  non-deterministic  plan  for  V  is  set  of  state-action  pears  ofV. 

States  are  assumed  to  be  fully  observable.  An  execution  of  a  non-deterministic  plan  is 
an  alternation  between  observing  the  current  state  and  choosing  an  action  to  apply  from 
the  set  of  actions  associated  with  the  state.  Similar  to  policies  in  game  theory,  we  call  a 
non-deterministic  plan  static  if  there  is  at  most  a  single  action  associated  with  each  state. 
Otherwise,  we  call  it  dynamic,  since  an  agent  executing  the  plan  may  change  its  preference 
about  which  action  to  apply. 

The  set  of  states  covered  by  a  plan  n  is 

States(7t)  =  {s  :  3a.  (s,a)  G  tt}.  (3.4) 

The  set  of  actions  in  a  plan  n  associated  with  a  state  s  is 

Act(7t,s)  =  {a  :  (s,a)  €  tt}.  (3.5) 

The  closure  of  a  plan  7r  is  the  set  of  possible  end  states 

Closure(7t)  =  {s'  £  States(7t)  :  3(s,  a)  e  n.s'  e  Next(s,o)}.  (3.6) 
A  plan  7r  is  said  to  be  total  iff  CLOSURE  (n)  C  G. 
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Example  3.4  A  non-deterministic  version  of  the  deterministic  planning  problem  described 
in  Example  3.1  is  shown  in  Figure  3.5.  We  have 

5  =  {A,B,C,D}, 

Act  =  {a,/?,  7}, 

->  =  {{A,P,B),  <5,7,  D),  (C,a,A),  (C,a,D),  (D,/3,C)}, 
so  =  C, 

G  =  {B}. 

Notice  that  the  a  action  is  non-deterministic  since  it  can  lead  to  two  states  from  s0.  A  plan 
for  solving  the  problem  could  be  tv  =  {{C,a),  (D,/3),  ( A,/3 )}.  We  have  States (n)  = 
{C,  A,  D}  and  CLOSURE  (7r)  =  {B}  C  G,  thus,  7 r  is  total.  0 


♦ 
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Figure  3.5:  A  non-deterministic  planning  problem  with  four  states  A ,  B,  C,  and  D 
and  three  actions  a  (dashed),  j3  (solid),  and  7  (dotted).  The  initial  state  is  C  while  the 
set  of  goal  states  is  a  singleton  set  {B}. 


Notice  that  the  definition  of  a  non-deterministic  plan  does  not  give  any  guarantees  about 
goal  achievement.  The  reason  is  that,  in  contrast  to  deterministic  plans,  it  is  natural  to 
define  a  range  of  solutions  classes.  There  currently  exists  three  classes  of  non-deterministic 
plans  called  weak,  strong  cyclic,  and  strong  [36,  37].  Following  [42,  34],  we  use  CTF  to 
define  these  solutions.  First,  we  need  to  define  a  Kripke  structure  to  represent  the  execution 
behavior  of  a  plan. 


Definition  3.8  (Execution  Model)  A/7  execution  model  with  respect  to  a  non-deterministic 
plan  tv  for  the  domain  V  =  (S,  Act,  — »)  is  a  Kripke  structure  A4(tv)  =  (S,  R)  where 

•  S  =  Closure  (77)  u  States  (tv)  u  G, 

•  (s,  s')  G  R  iff  s  G,  3a .  (s,  a)  G  n  and  s  A  s',  or  s  =  s'  and  s  G  CLOSURE  (77)  U 
G. 
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Figure  3.6:  The  execution  model  of  the  plan  in  Example  3.4. 

Example  3.5  The  execution  model  of  the  plan  in  Example  3.4  is  shown  in  Figure  3.6.  It 

has  5  =  {.4,  B,  C,  D}  and  R=  {(A,  B),  { B ,  B ),  (C,  A),  (C,  D ),  (D,  67)}.  0 

Notice  that  all  execution  paths  are  infinite  which  is  required  in  order  to  define  solutions 
in  CTL.  If  a  state  is  reached  that  is  not  covered  by  the  plan  (e.g.,  a  goal  state  or  a  dead 
end),  the  postfix  of  the  execution  path  from  this  states  is  an  infinite  repetition  of  it.  Given  a 
Kripke  structure  defining  the  execution  of  a  plan,  weak,  strong  cyclic,  and  strong  plans  are 
defined  by  the  CTL  formulas  below. 

Definition  3.9  (Weak,  Strong  Cyclic,  and  Strong  Plans)  Given  a  non-deterministic plan¬ 
ning  problem  V  =  (V,  s0,  67)  and  a  plan  tt  for  V 

•  7 t  is  a  weak  solution  iff  So  (=  EF  G, 

•  7 t  is  a  strong  cyclic  solution  iff  M.  (7r),  Sq  [=  AGEF  G, 

•  n  is  a  strong  solution  iffAi(ir),  s0  (=  AF  G. 

An  execution  of  a  strong  plan  is  guaranteed  to  reach  states  covered  by  the  plan  until  a  goal 
state  after  a  finite  number  of  steps  is  reached.  An  execution  of  a  strong  cyclic  plan  is  also 
guaranteed  to  reach  states  covered  by  the  plan  or  a  goal  state.  However,  due  to  cycles,  it 
may  never  reach  a  goal  state.  An  execution  of  a  weak  plan  may  reach  states  not  covered  by 
the  plan,  it  only  guarantees  that  some  execution  exists  that  reaches  the  goal  from  the  initial 
state. 

3.2.1  Encoding  NADL  Domains 

Compared  with  the  wide  range  of  deterministic  planning  languages,  the  number  of  non- 
deterministic  planning  languages  is  limited.  The  work  presented  in  this  thesis  rests  on  the 
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Non-deterministic  Agent  Domain  Language  (NADL)  [93]  due  to  its  explicit  representation 
of  environment  actions. 

NADL  was  developed  as  input  language  to  the  Universal  Multi-agent  Obbd-based  Plan¬ 
ner  (UMOP)  [93].  An  NADL  planning  problem  consists  of:  a  set  of  state  variables,  a 
description  of  system  and  environment  agents,  and  a  specification  of  an  initial  and  goal 
condition.  The  set  of  state  variable  assignments  defines  the  state  space  of  the  domain.  An 
agent’s  description  is  a  set  of  actions.  The  agents  change  the  state  of  the  world  by  perform¬ 
ing  actions  that  are  assumed  to  be  executed  synchronously  and  to  have  a  fixed  and  equal 
duration.  At  each  step,  all  of  the  agents  perform  exactly  one  action,  and  the  resulting  action 
tuple  is  a  joint  action.  The  system  agents  are  assumed  to  be  controllable,  while  the  envi¬ 
ronment  agents  model  the  uncontrollable  world.  A  valid  domain  description  requires  that 
the  system  and  environment  agents  modify  a  disjoint  set  of  state  variables.  Otherwise  they 
may  be  able  to  control  each  other  through  their  choice  of  actions.  An  action  has  three  parts: 
a  set  of  modified  state  variables,  a  precondition  formula,  and  an  effect  formula.  The  next 
state  value  of  the  modified  variables  is  defined  by  the  effect  formula  and  may  depend  on 
the  value  of  the  current  state  variables.  During  execution,  the  action  has  exclusive  access 
to  the  modified  state  variables  and  it  can  not  change  the  value  of  any  other  state  variables. 
In  order  for  the  action  to  be  applicable,  the  precondition  formula  must  be  satisfied  in  the 
current  state.  The  values  of  state  variables  not  modified  by  a  joint  action  are  unchanged. 
The  initial  and  goal  condition  are  formulas  that  must  be  satisfied  in  the  initial  state  and  the 
goal  states,  respectively.  We  assume  that  the  initial  condition  only  represents  a  single  state. 

Example  3.6  An  NADL  problem  is  shown  in  Figure  3.7.  The  problem  has  two  state  vari¬ 
ables:  a  numerical  one,  position  pos  and  a  propositional  one,  power.  The  position  is  a 
natural  number  that  can  be  represented  by  three  Boolean  variables.  This  gives  pos  the  do¬ 
main  {0, 1,2,3,  4,  5,  6,  7}.  The  system  is  a  robot  moving  between  the  eight  positions.  It  has 
two  actions  Right  and  Left.  The  Right  and  Left  actions  have  conditional  effects  described 
by  an  if-then-else  operator  (-*).  If  the  power  is  on  (that  is,  power  is  true),  they  increase 
or  decrease  the  position,  otherwise  they  cause  no  position  change.  The  Right  action  is 
non-deterministic.  It  may  increase  the  position  with  either  one  or  two.  The  Left  action 
is  deterministic.  It  always  decreases  the  position  with  one.  The  environment  is  a  human 
that  controls  the  power  with  two  actions  On  and  Off.  Since  the  system  and  environment 
must  apply  exactly  one  action  at  each  step,  there  are  four  joint  actions  Left-On,  Left-Off, 
Right-On,  and  Right-Off.  Initially,  the  power  is  on  and  the  robot  is  at  position  0.  0 

There  are  two  sources  of  non-determinism  in  NADL  domains.  The  first  is  non-determini¬ 
stic  actions  not  constraining  all  their  modified  variables  to  a  single  value  in  the  next  state. 
The  second  is  the  uncontrollable  actions  of  the  environment.  We  define  actions  to  be  inter¬ 
fering  if  either 
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variables 
nat(3)  pos 
bool  power 
system 

agt:  Robot 
Right 
mod:  pos 
pre:  pos  <  6 

eff:  power  — >  {pos'  —  pos  +  1  V  pos'  —  pos  +  2),  pos'  —  pos 
Left 

mod:  pos 
pre:  pos  >  0 

eff:  power  — >  pos'  =  pos  —  1  ,pos'  =  pos 

environment 
agt:  Human 
On 

mod:  power 
pre:  -i  power 
eff:  power' 

Off 

mod:  power 
pre:  power 
eff:  -i  power' 

initially 

pos  =  0  A  power 

goal 

pos  —  7 


Figure  3 .7 :  An  NADL  planning  problem. 


1.  they  have  inconsistent  effects,  or 

2.  they  constrain  an  overlapping  set  of  state  variables. 

The  first  condition  is  due  to  the  fact  that  state  knowledge  is  expressed  in  a  monotonic  logic 
that  cannot  represent  inconsistent  knowledge.  The  second  condition  addresses  the  problem 
of  sharing  resources.  We  assume  that  each  state  variable  at  most  can  be  accessed  by  a  single 
action  at  each  step  even  if  the  effect  of  several  actions  is  consistent. 
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Abstract  Syntax  of  NADL 

The  abstract  syntax  of  an  NADL  description  is  a  7-tuple  D  =  (SV ,  S,  E,  Act,  d,  I ,G) 
where 

•  SV  =  BVar  U  NVar  is  a  finite  set  of  state  variables  comprised  of  a  finite  set  of 
Boolean  variables,  BVar ,  and  a  finite  set  of  numerical  variables  with  finite  domains, 
NVar , 

•  S'  is  a  finite,  nonempty  set  of  system  agents, 

•  E  is  a  finite  set  of  environment  agents, 

•  Act  is  a  set  of  action  descriptions  (mod,  pre,  eff)  where  mod  is  the  set  of  state  vari¬ 
ables  modified  by  the  action,  pre  is  a  precondition  state  formula  in  the  set  SForm 
and  eff  is  an  effect  formula  in  the  set  Form.  The  sets  SForm  and  Form  are  defined 
below. 

•  d  :  Agt  -»  2Act  is  a  function  mapping  agents  (Agt  =  S  U  E)  to  their  actions. 

•  I  £  SForm  is  the  initial  condition, 

•  G  £  SForm  is  the  goal  condition. 

For  a  valid  domain  description,  we  require  that  actions  of  system  agents  modify  a  disjoint 
set  of  variables 

moda  fl  moda  =  0. 

( y,s  £  S  ae  £  E 

a  £  d(as)  a  £  d(ae) 

The  set  of  formulas  Form  is  constructed  from  the  following  alphabet  of  symbols 

•  A  finite  set  of  current  state  v  and  next  state  v'  variables,  where  v.  v'  £  SV, 

•  The  natural  numbers  N, 

•  The  arithmetic  operators  +,—,/,  and  *, 

•  The  relation  operators  >,<,<,>,  =  and  yL 

•  The  Boolean  operators  -i,V,A,=>-,44  and  — 

•  The  special  symbols  true,  false,  parentheses  and  comma. 


42 


CHAPTER  3.  BDD-BASED  PLANNING 


Arithmetic  expressions  are  defined  inductively  by 

•  Every  numerical  state  variable  v  <G  NVar  is  an  arithmetic  expression, 

•  A  natural  number  is  an  arithmetic  expression, 

•  If  ei  and  e2  are  arithmetic  expressions  and  ©  is  an  arithmetic  operator,  then  e\  ©  e2 
is  an  arithmetic  expression. 

Finally,  formulas  Form  are  defined  inductively  by 

•  true  and  false  are  formulas, 

•  Boolean  state  variables  dgB  Var  are  formulas, 

•  If  e\  and  e2  are  arithmetic  expressions  and  72  is  a  relation  operator,  then  e\  72  e2  is  a 
formula, 

•  If  /l,  f2  and  /3  are  formulas,  so  are  (^/j),  (jj  V  /2),  (/i  A  /2),  (/i  =4-  /2),  (/i  44  /2) 
and  (/i  ->■  /2,  /3). 


Parentheses  have  their  usual  meaning  and  operators  have  their  usual  priority  and  associa¬ 
tivity  with  the  if-then-else  operator  »”  given  lowest  priority.  SForm  C  Form  is  a  subset 
of  the  formulas  only  referring  to  current  state  variables.  All  of  the  symbols  in  the  alphabet 
of  formulas  have  their  usual  meaning  with  the  if-then-else  operator  / 1  — >-  /2,  /3  being  an 
abbreviation  for  (/i  A  /2)  V  (->/i  A  /3). 

The  domain  of  a  numerical  state  variable  v  E  NVar  is  given  by  dom{v )  =  {0, 1,  •  •  • ,  tv }, 
where  tv  >  0.  Let  JActs  and  JActe  denote  the  set  of  joint  actions  of  system  agents  and 
environment  agents,  respectively 

JAds  —  ^  ^  7 

asES 

JActe  =  Y\.  d(ae)' 

aeEE 

Moreover,  let  JAct  denote  the  set  of  joint  actions  of  system  and  environment  agents  JAct  = 
JActs  x  JActe. 

Encoding  NADL  Domains  with  BDDs 

An  NADL  description  (SV,  S.  E.  Act ,  d.  /,  G)  represents  a  non-deterministic  planning  prob¬ 
lem  Vnd  =  (Vnd,  Sgd,  Gnd)  where  Vnd  =  (Snd,  Actnd ,  — >)  and 
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•  Snd  =  BBVar  x  UveNVar  dom(v), 

•  Actnd  =  JActs, 

_  „nd  .  T(ond\ 

•  60  '  1\b0  )•> 

•  Gnd  =  {s  :  G(s)}, 

•  s  H  s'  iff  R(s,js,  s'). 

The  transition  relation  is  given  by 

R(s,js,  s')  =  3 je  e  JActe  .  R(s,  j,  s') 

where  j  E  JAct  is  the  joint  action  of  the  system  and  environment  actions  js  and  je  given  by 
j  =  jSl ,  ■  ■  ■ ,  jSls] ,  jei ,  •  •  • ,  jem  and  R(s ,  j,  s')  is  the  transition  relation  of  the  joint  system 
and  environment  actions.  R(s,  j,  s')  is  a  conjunction  of  three  relations  A,  F,  and  / 

R(s,j,  s')  =  A(s,j,  s')  A  F(s,j,  s')AI(j). 

A  defines  the  constraints  on  the  current  state  s  and  next  state  s'  caused  by  the  actions  in  the 
joint  action  j .  A  further  ensures  that  actions  with  inconsistent  effects  cannot  be  applied  con¬ 
currently,  since  A  reduces  to  false  if  any  pair  of  actions  in  j  has  inconsistent  effects.  Thus, 
A  also  ensures  the  first  condition  for  avoiding  interference  between  concurrent  actions.  We 
have 

I  Agt\ 

Ms,j,s')  =  /\  {pre^s)  A  eff^s,  s')) . 

i— 1 

F  is  a  frame  relation  ensuring  that  unmodified  variables  are  unchanged 

F(s,j,s')=  /\  (■ v  =  v ') 

vesv\c 

where  C  =  |jl=f '  mod3i . 

Finally,  I  ensures  the  second  condition  for  avoiding  interference  between  concurrent  ac¬ 
tions. 

J(j)  =  A  (mo4  n  modJk  =  0)- 

i^k 

An  NADL  domain  corresponds  to  a  synchronous  system  where  each  agent  is  an  ac¬ 
tivity.  From  the  discussion  in  Section  2.3.1,  we  can  therefore  expect  to  be  able  to  use  a 
conjunctive  partitioning  to  represent  the  transition  relation  of  the  domain.  The  definition  of 
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R(s ,  j,  s')  above  verifies  this,  since  R(s,  j,  s')  is  a  conjunction  of  subexpressions.  However, 
the  existential  quantification  in  the  expression 

R(s,js,s')  =  3  je  e  JActe.R(s,j,  s'). 

does  not  distribute  over  the  conjunction  of  subexpressions  in  R(s,  j,  s').  It  is  possible 
though,  to  use  the  early  quantification  technique  explained  in  Section  2.3.1  to  obtain  a  con¬ 
junctive  partitioning  of  R(s,  js,  s')  by  moving  subexpression  out  of  scope  of  the  existential 
quantification. 

Notice  that,  for  non-deterministic  planning,  the  BDD  encoding  of  the  transition  relation 
needs  to  be  complete  in  the  sense  that  the  action  of  a  transition  must  be  encoded  in  the  BDD. 
For  deterministic  planning,  the  disjunctive  partitioning  of  the  transition  relation  could  be 
used  to  represent  actions.  This  is  not  possible  for  non-deterministic  planning  since  the 
search  algorithms  reason  about  state-action  pairs  (SAs)  and  synthesize  plans  represented 
by  a  set  of  SAs. 

In  the  remainder  of  the  thesis,  we  focus  on  simple  multi-agent  problems  with  a  single 
system  agent  and  at  most  one  environment  agent.  We  have  developed  a  specialized  version 
of  NADL  called  NADL+  where  the  system  and  environment  is  represented  by  a  set  of 
actions  instead  of  a  set  of  agents.  In  addition,  NADL+  has  features  to  support  guided 
BDD-based  search  and  failure  effects  of  actions.  If  no  environment  actions  exist,  it  is 
straight  forward  to  encode  the  transition  relation  of  an  NADL+  domain  as  a  disjunctive 
partitioning.  Otherwise,  if  environment  actions  exist,  we  “flatten”  the  action  descriptions 
by  computing  each  joint  action,  which  again  makes  it  possible  to  represent  the  transition 
relation  as  a  disjunctive  partitioning.  This  can  be  done  efficiently  due  to  the  relatively  small 
number  of  joint-actions.  NADL+  is  described  in  more  detail  in  Appendix  A. 

3.2.2  Planning  Algorithms 

Weak,  strong  cyclic,  and  strong  plans  can  be  synthesized  by  a  backward  breadth-first  search 
from  the  goal  states  to  the  initial  states.  The  search  algorithm  is  shown  in  Figure  3.8.  In 
each  iteration  (1.2-7),  a  precomponent  Pc  of  the  plan  is  computed  from  the  states  C  currently 
covered  by  the  plan.  If  the  precomponent  is  empty,  a  fixed  point  of  P  has  been  reached 
that  does  not  cover  the  initial  states  and  “no  solution  exists”  is  returned.  Otherwise,  the 
precomponent  is  added  to  the  plan  and  the  states  in  the  precomponent  are  added  to  the  set 
of  covered  states  (1.6-7).  The  precomponent  function  must  fulfill  the  specification  given 
below. 

Definition  3.10  (Precomponent  Function)  A  valid  precomponent  function  PreComp  (C)  : 
2s  — >-  2SxAct  must  terminate.  In  addition,  For  any  state-action  pair  in  the  precomponent 
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function  NDP(s0,  G) 

1  P  4—  0;  G  4-  G 

2  while  s0  ^  C 

3  Pc  4-  PreComp  (C) 

4  if  Pc  =  0  then  return  “no  solution  exists” 

5  else 

6  P  4-  P  U  Pc 

7  C4-CU  States  (Pc) 

8  return  P 


Figure  3.8:  A  generic  algorithm  for  synthesizing  non-deterministic  plans. 


(s,  a)  e  PreComp  (G),  we  have  s  ^  C. 

Since  the  set  of  states  is  finite  and  the  precomponent  function  terminates,  P  must  eventually 
reach  a  maximum  size.  Thus,  it  can  be  shown  that  NDP  terminates. 

Theorem  3.1  (Termination)  NDP  terminates. 

Proof.  Given  in  Appendix  B  □ 

The  Strong,  strong  cyclic,  and  weak  planning  algorithms  only  differ  by  the  definition  of 
the  precomponent.  The  core  operation  is  to  find  the  preimage  where  states  are  associated 
with  actions 


PreImgSA(G)  =  3v'.  R(v,a,v')  AC(v)[v/v'].  (3.7) 

As  a  set  computation  PreImgSA(G)  is  defined  by 

PreImgSA(G)  =  {(s,  a)  :  Next(s,  a)  n  G  ±  0}.  (3.8) 

The  weak  and  strong  precomponent  is  the  set  of  SAs  given  by 

PreCompW(G)  =  PreImgSA(G)  \  G  x  Act  (3.9) 

PreCompS(G)  =  (PreImgSA(G)  \  PreImgSA(G))  \  G  x  Act  (3.10) 

The  strong  cyclic  precomponent  PreCompSC(G)  can  be  generated  by  iteratively  extend¬ 
ing  a  set  of  candidate  SAs  (wSA)  and  pruning  it  until  a  fixed  point  is  reached  [34].  The 
precomponent  function  is  shown  in  Figure  3.9. 

Let  Weak,  StrongCyclic,  and  Strong  denote  the  NDP  algorithm  using  Pre- 
CompW,  PreCompSC,  and  PreCompS,  respectively.  It  is  shown  in  Appendix  B  that 
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function  PreCompSC(C) 

1  wSA <-0 

2  repeat 

3  OldwSA  •<—  u,',5',4 

4  w&4  PreImgS  A  (States  (wSA)  u  C)  \  C  x  Act 

5  scSA  <—  SCPlanAu x(wSA,  C) 

6  until  scSA  ^  0  V  u>5A  =  OldwSA 

7  return  .sc  571 

function  SCPLANAux(start574,  C) 

1  54f-  startSA 

2  repeat 

3  OW514  «-  SA 

4  SA  <-  PruneOutgoing (SA,  C ) 

5  574  «-  PruneUnconnected(5A,  C) 

6  until  5A  =  OldSA 

7  return  571 

function  PruneOutgoing  (504,  C) 

1  NewSA  i —  SA  \  PreImgS  A  (C  U  States  (SA)) 

2  return  NewSA 

function  PruneUnconnected(5A,  C) 

1  NewSA  <-  0 

2  repeat 

3  OldS A  «-  NewSA 

4  NewSA  SA  n  PreImgS  A  (C  U  States  (NewSA)) 

5  until  NewSA  =  OldSA 

6  return  NewSA 


Figure  3.9:  The  strong  cyclic  precomponent  function. 


Weak,  StrongCyclic,  and  Strong  are  sound  and  complete  and  have  valid  precompo¬ 
nents.  Since  we  have  shown  the  generic  non-de  termini  Stic  algorithm  then  terminates,  we 
have 

Theorem  3.2  (Correctness  of  Weak,  StrongCyclic,  and  Strong)  The  Weak,  Strong- 
Cyclic,  and  STRONG  planning  algorithms  are  correct.  The  algorithms  return  “no  solu- 
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lion  exists”  iff  no  solution  exists,  otherwise  they  return  a  valid  solution. 

Proof.  This  follows  from  the  soundness,  completeness,  and  termination  theorems  of  each 
algorithm  proven  in  Appendix  B.  □ 

Due  to  the  breadth-first  search  carried  out  by  the  non-deterministic  planning  algorithm, 
weak  solutions  have  minimum  length  best-case  execution  paths  and  strong  solutions  have 
minimum  length  worst-case  execution  paths  [34].  Formally,  for  a  non-deterministic  plan¬ 
ning  domain  V  and  a  plan  tt  of  V  let 

Exec(s,  7t)  =  {q  :  q  is  a  path  of  M.{tt)  and  q0  =  s}  (3.11) 


denote  the  set  of  execution  paths  of  tt  starting  at  s.  Let  the  length  of  a  path  q  =  q0qi  •  •  • 
with  respect  to  a  set  of  states  C  be  defined  by 


j  i  :  if  qi  G  C  and  q3  C  for  0  <  j  <  i 
\  oo  :  otherwise. 


(3.12) 


We  will  say  that  an  execution  path  q  reaches  a  state  s  iff  \q\{s}  f  oo.  In  addition,  we  will  call 
a  state  s  connected  to  a  set  of  states  C  by  a  plan  tt  iff  M(tt),  s  |=  EF  C.  Let  Min(s,  C,  tt) 
and  Max(s,  C,  tt)  denote  the  minimum  and  maximum  length  of  an  execution  path  from  s 
to  C  of  a  plan  tt 


Min(s,G,  tt)  =  min  \q\c  (3.13) 

gGEXEC(s,7r) 

Max(s,C,  tt)  =  max  \q\c-  (3.14) 

geEXEC(s,7r) 

Similarly,  let  II  denote  the  set  of  all  plans  of  V  and  let  WDist(s,  C)  (weak  distance)  and 
SDist(s,  C)  (strong  distance)  denote  the  minimum  of  Min(s,  C,  tt)  and  Max(s,  C,  tt)  for 
any  plan  n  G  II  of  V 


WDist(s,  C)  =  minMlN(s,C,7r)  (3.15) 

7r£n 

SDist(s,C)  =  minMAx(s,  C,  tt).  (3.16) 

7r£n 

It  can  be  shown  that  the  Weak  and  Strong  algorithms  are  optimal  with  respect  to  weak 
and  strong  distance. 

Theorem  3.3  (Optimality  of  Weak  and  Strong) 

•  If  tt  is  a  solution  returned  by  Weak(s0,  G)  then  Min(s0,  G,  tt)  =  WDlST(s0,  G). 

•  If  tt  is  a  solution  returned  by  Strong(sq,  G)  then  Max(s0,  G,  tt)  =  SDist(s0,  G). 
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Proof.  Follows  from  the  optimality  proofs  of  Weak  and  Strong  given  in  Appendix  B.  □ 

A  limitation  of  strong  cyclic  and  strong  planning  compared  to  weak  planning  is  that 
strong  cyclic  and  strong  plans  often  do  not  exist  because  it  is  impossible  to  avoid  dead 
ends.  Consider  generating  a  non-deterministic  plan  for  a  system  that  can  be  in  a  set  of  bad 
states,  a  set  of  good  states  or  a  set  irrecoverable  failed  states  (dead-ends).  Assume  that 


Bad  States 


Good  States 


c 


States 

(Dead-Ends) 


Figure  3.10:  System  with  irrecoverable  states. 


there  exist  actions  that  can  bring  the  system  from  any  bad  state  to  a  good  state.  However, 
these  actions  may  fail  and  cause  transitions  to  bad  states  or  even  irrecoverable  failed  states 
(see  Figure  3.10).  No  strong  nor  strong  cyclic  plan  can  be  found  since  an  irrecoverable 
state  can  be  reached  from  any  initial  state.  There  only  exists  a  weak  plan  for  this  problem. 
However,  weak  plans  are  mostly  useless  since  actions  are  chosen  without  reasoning  about 
their  worst-case  behavior. 

Another  limitation  of  strong  and  strong  cyclic  plans  is  their  inherent  pessimism.  Con¬ 
sider  for  example  the  domain  illustrated  in  Figure  3.11.  The  domain  consists  of  n  + 1  states 
and  two  different  actions  (dashed  and  solid).  The  only  strong  cyclic  and  strong  solution  is 

D 


1 

i 


0  1  n 


Figure  3.11:  A  domain  with  two  actions  (drawn  as  solid  and  dashed  arrows)  illus¬ 
trating  the  possible  loss  of  short  execution  paths.  I  and  G  are  the  initial  and  goal  state, 
respectively. 


{(0,  solid),  (1,  solid),  ■  ■  ■ ,  (n—1,  solid)}.  There  is  a  single  execution  path  associated  with 
this  plan  that  reaches  the  goal  state  in  n  steps.  However,  a  weak  plan  { (0,  dashed) }  may  be 
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preferable  since  the  probability  of  its  best-case  execution  length  of  1  may  be  much  higher 
than  its  worst-case  infinite  execution  length. 
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Chapter  4 

State-Set  Branching 


In  this  chapter,  we  introduce  a  new  framework  called  state-set  branching  [88,  89,  90,  91]. 
State-set  branching  combines  BDD-based  search  and  heuristic  search.  The  philosophy 
of  state-set  branching  is  that  the  information  represented  by  BDDs  must  be  semantically 
closely  related  in  order  for  the  BDD  operations  to  work  efficiently.  Hence,  we  separate  the 
representation  of  information  used  to  guide  the  search  algorithm  from  the  representation 
of  states  and  transitions  and  only  use  BDDs  to  encode  the  latter.  The  framework  has  two 
independent  parts:  a  modification  of  the  Best-First  Search  algorithm  (BFS)  described  in 
Section  2.4  to  a  new  algorithm  called  Best-Set-First  Search  (BSFS),  and  an  efficient  BDD- 
based  implementation  of  this  algorithm  based  on  a  partitioning  of  the  transition  relation 
called  branching  partitioning.  In  Section  4.1,  we  introduce  the  BSFS  algorithm  and  show 
that  it  applies  to  any  classical  BFS  algorithm,  any  transition  cost  function,  heuristic  func¬ 
tion,  and  node-evaluation  function.  In  Section  4.2,  we  define  branching  partitioning  and 
describe  how  this  new  BDD  partitioning  technique  can  be  used  to  implement  the  BSFS  al¬ 
gorithm.  Finally,  Section  4.3  describes  an  experimental  evaluation  of  two  implementations 
of  A*  called  GhSetA*  and  FSetA*.  The  performance  of  these  algorithms  is  compared 
to  unguided  BDD-based  search,  ordinary  single-state  A*,  and  BDDA*,  the  only  previous 
BDD-based  implementation  of  A*.  The  evaluation  includes  8  search  domains  ranging  from 
VLSI-design  with  synchronous  actions,  to  classical  AI  problems  such  as  (n2  —  1)-Puzzles, 
Blocks  World  and  problems  used  in  the  AIPS  1998,  2000  and  2002  planning  competitions 
[113,4,  115].  We  apply  four  different  families  of  heuristic  functions  ranging  from  the  min¬ 
imum  Hamming  distance  to  the  sum  of  Manhattan  distances  for  the  (n2  —  1) -Puzzle,  and 
HSPr  [20]  for  planning  problems.  The  experimental  evaluation  shows  that  GhSetA*  and 
FSetA*  consistently  outperform  single-state  A*,  except  when  the  heuristic  is  very  strong. 
In  addition,  we  show  that  it  can  improve  the  complexity  of  single-state  search  exponentially 
and  that  it  often  dominates  both  single-state  A*  and  blind  BDD-based  search  by  several  or¬ 
ders  of  magnitude.  Moreover,  it  consistently  outperforms  BDDA*. 
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4.1  Best-Set-First  Search 

The  Best-Set-First  Search  (BSFS)  algorithm  generalizes  BFS  to  build  a  search  tree  where 
each  search  node  contains  a  set  of  states  associated  with  the  same  search  information.  There 
are  two  main  properties  of  BDD-based  search  techniques  that  this  algorithm  exhibits. 

1.  It  can  exploit  the  ability  of  the  image  computation  to  find  next  states  of  a  set  of  states 
effectively  when  expanding  a  search  node. 

2.  Since  a  search  node  contains  states  associated  with  the  same  search  information,  it 
can  avoid  using  inefficient  symbolic  arithmetic  operations  to  find  the  search  informa¬ 
tion  associated  with  states  in  child  nodes. 

BSFS  can  implement  heuristic  search  tree  algorithms  where  the  search  node  informa¬ 
tion  used  to  prioritize  the  node  expansion  can  be  computed  by  associating  each  transition 
(s,  a ,  s')  of  the  search  domain  with  a  change  51  (s,  o,  s')  of  the  search  node  information. 
Thus,  if  s  belongs  to  a  node  with  information  /  and  s'  is  reached  with  transition  (s,  a,  s') 
then  s'  belongs  to  a  search  node  with  information  /  +  5I(s,  a ,  s').  For  A*,  the  search  node 
information  can  be  one  or  two  dimensional:  either  it  is  the  /-value  or  the  g  and  /i- value.  In 
the  first  case,  5I(s,  a,  s')  is  the  change  in  /-value  caused  by  the  transition. 

Example  4.1  The  5h,  5g,  and  resulting  5f  values  of  the  problem  introduced  in  Example  2.8 
are  shown  in  Figure  4.1.  0 


h  = 


5g  = 
Sh  = 
5f  = 


h  = 


1© 
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1  1 
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Sh  =  —  1 

2© 
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Sf  =  2 
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(b)  h  =  0 

Sg  =  2 
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I  5f  =  3 

T 

( £ )  h  =  1 


Figure  4. 1 :  The  //'-values  of  the  search  problem  introduced  in  Example  2.8. 


The  BSFS  algorithm  shown  in  Figure  4.2  is  almost  identical  to  the  ordinary  BFS  al¬ 
gorithm  defined  in  Figure  2.8.  However,  the  state-set  version  builds  a  search  tree  during 
the  search  process  where  each  search  node  contains  a  set  of  states.  Multiple  states  in  each 
node  emerge  because  child  nodes  with  identical  node  information  are  coalesced  by  the 
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function  BSFS  (s0,  Io,  G) 

1  frontier  4—  MakeQueue(({s0},  70)) 

2  loop 

3  if  \frontier\  =  0  then  return  ”no  solution  exists” 

4  {S,  I)  4-  RemoveTop  (frontier) 

5  if  S  fl  G  ^  0  then  return  ExtractSolution (/render,  (S  n  G,  /)) 

6  frontier  4-  EnqueueAndMerge  (/ron^er,  StateSetExpand  ((S',  /))) 


Figure  4.2:  The  Best-Set-First  Search  (BSFS)  algorithm. 


S tateS etExpand  function  in  line  6  and  because  the  EnqueueAndMerge  function 
may  merge  child  nodes  with  nodes  on  the  frontier  queue  having  identical  node  informa¬ 
tion.  The  StateSetExpand  function  is  defined  in  Figure  4.3. 

function  StateSetExpand ((S,  /)) 

1  child  emptyMap 

2  foreach  state  s  in  S 

3  foreach  transition  ( s ,  a,  s') 

4  Ic  4—  I  +  51  (s,  a,  s') 

5  child[/c]  4-  child [/c]  U  {s'} 

6  return  MakeNodes  (child) 

Figure  4.3:  The  StateSetExpand  function. 


Child  states  with  node  information  I  are  stored  in  child[I].  The  outgoing  transitions 
from  each  state  in  the  parent  node  are  used  to  find  all  successor  states.  The  function 
MakeNodes  called  at  line  6  constructs  the  child  nodes  from  the  completed  child  map. 
Each  child  node  contains  states  with  the  same  search  information.  However,  there  may 
exist  several  nodes  with  the  same  node  information.  In  addition,  MakeNodes  may  prune 
some  of  the  child  states  (e.g.,  to  implement  cycle  detection  in  A*). 

Example  4.2  Figure  4.4  shows  the  search  tree  traversed  by  the  BSFS  algorithm  for  A* 
applied  to  the  problem  in  Example  4.1.  0 

In  order  to  introduce  multiple  states  in  each  search  node  and  reduce  the  number  of  search 
nodes,  the  EnqueueAndMerge  function  of  the  BSFS  algorithm  may  merge  nodes  on  the 
search  frontier  having  identical  search  information.  This,  however,  transforms  the  search 
tree  into  a  Directed  Acyclic  Graph  (DAG).  We  will  refer  to  this  DAG  as  a  search  structure. 
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Figure  4.4:  State-set  search  tree  example. 

Lemma  4.1  The  search  structure  build  by  the  BSFS  algorithm  is  a  DAG  where  every  node 
(S',  /')  different  from  a  root  node  ({so},  Iq)  has  a  set  of  predecessor  nodes.  For  each  state 
s'  G  S'  in  such  a  node  there  exists  an  action  a  and  a  predecessor  ( S ,  I)  with  a  state  s  G  S 
such  that  s  — V  s'  and  /'  =  I  +  51  (s,  a,  s'). 

Proof.  By  induction  on  the  number  of  loop  iterations,  we  get  that  the  search  structure  after 
the  first  iteration  is  a  DAG  consisting  of  a  root  node  { { .s- 0 } .  If).  For  the  inductive  step, 
assume  that  the  search  structure  is  a  DAG  with  the  desired  properties  after  n  iterations  of 
the  loop  (see  Figure  4.2).  If  the  algorithm  in  the  next  iteration  terminates  in  line  3  or  5, 
the  search  structure  is  unchanged  and  therefore  a  DAG  with  the  required  format.  Assume 
that  the  algorithm  does  not  terminate  and  that  {.S'.  I)  is  the  node  removed  from  the  top  of 
frontier.  The  node  is  expanded  by  forming  child  nodes  with  the  StateSetExpand  func¬ 
tion  in  line  6.  According  to  the  definition  of  this  function,  for  any  state  s'  G  S'  in  a  child 
node  (S',  I')  there  exists  an  action  a  and  some  state  s  G  Sin  (S,  I)  such  that  s  A  s'  and 
I'  =  I  +  5I(s,  a,  s') .  Thus  (S,  I)  is  a  valid  predecessor  for  all  states  in  the  child  nodes.  Fur¬ 
thermore,  since  all  child  nodes  are  new  nodes,  no  cycles  are  created  in  the  search  structure 
which  therefore  remains  a  DAG.  If  a  child  node  is  merged  with  an  old  node  when  enqueued 
on  frontier  the  resulting  search  structure  is  still  a  DAG  because  all  nodes  on  frontier  are 
unexpanded  and  therefore  have  no  successor  nodes  that  can  cause  cycles.  In  addition,  each 
state  in  the  resulting  node  obviously  has  the  required  predecessor  nodes.  □ 


Lemma  4.2  For  each  state  s'  G  S'  of  a  node  (S',  I')  in  a  finite  search  structure  of  the  BSFS 
algorithm  there  exists  a  path  q$  ■  ■  ■  qn  with  associated  actions  tt  =  a.\  -  •  •  an  in  the  search 
domain  such  that  qn  =  s'  and  I'  =  /0  +  ffff- i  41  (qt~i.  ai,  qf). 
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Proof.  We  will  construct  the  path  by  tracing  the  edges  backwards  in  the  search  struc¬ 
ture.  Let  pn  =  s'.  According  to  Lemma  4.1  there  exists  a  predecessor  (S,I)  to  {S',  I') 
such  that  for  some  state  pn_ i  E  S  and  action  an  E  Act  we  have  pn_i  ^4  pn  and 
I'  =  I +  61  ( pni ,  an.  pn).  Continuing  the  backward  traversal  from  pn_i  must  eventually  ter¬ 
minate  since  the  search  structure  is  finite  and  acyclic.  Moreover,  the  traversal  will  terminate 
at  the  root  node  because  this  is  the  only  node  without  predecessors.  Assume  that  the  back¬ 
ward  traversal  terminates  after  n  iterations.  Then  q0  ■  ■  ■  qn  =  p0  ■  ■  ■  pn  and  tv  =  an  •  •  •  an  □ 

The  ExtractSolution  function  in  line  5  of  the  BSFS  algorithm  uses  the  backward 
traversal  described  in  the  proof  of  Lemma  4.2  to  extract  a  solution.  We  can  now  prove 
soundness  of  the  BSFS  algorithm. 

Theorem  4.1  (Soundness  of  BSFS)  If  the  BSFS  algorithm  returns  a  solution  7r  =  a\  •  •  •  an 
with  associated  path  q0  ■  ■  -qn  and  the  search  node  information  ofqn ’s  search  node  is  I  then 
tv  is  a  valid  solution  and  I  =  Iq  +  ±  6I(qi- i,  a,,  qt). 

Proof.  Since  qn  E  G  it  follows  from  Femma  4.2  and  the  definition  of  ExtractSolution 
that  tv  is  a  solution  to  the  search  problem  and  I  =  I0  +  ]T”=1  5/(^_1 ,  a*,  qf).  □ 

It  is  not  possible  to  show  that  the  BSFS  algorithm  in  general  is  complete  since  it  covers  in¬ 
complete  algorithms  such  as  pure  heuristic  search.  However,  it  follows  from  the  optimality 
proofs  below  that  BSFS  is  complete  when  implementing  the  A*  algorithm. 

Example  Implementations 

The  BSFS  algorithm  can  be  used  to  implement  all  classical  variants  of  the  BFS  algorithm 
including  pure  heuristic  search,  A*,  weighted  A*,  uniform  cost  search,  beam  search,  and 
hill  climbing.  With  some  modifications,  it  also  covers  iterative  deepening  heuristic  search 
algorithms  such  as  IDA*. 

Pure  heuristic  search  is  implemented  by  using  the  values  of  the  heuristic  function  as 
search  node  information  and  sorting  the  nodes  on  the  frontier  in  ascending  order  such  that 
the  top  node  contains  states  with  least  /i- value.  The  search  node  information  of  the  initial 
state  is  /0  =  h(s0 )  and  each  transition  (s,  a,  s')  is  associated  with  the  change  in  h,  that 
is,  ci/(s,  a ,  s')  =  h(s')  —  h(s).  In  each  iteration,  this  pure  heuristic  search  algorithm  will 
expand  all  states  with  least  h- value  on  the  frontier  given  that  all  nodes  with  identical  h- value 
are  merged  on  the  frontier  queue. 

A*  can  be  implemented  by  setting  70  =  h(s0 )  and  5/(s,  a,  s')  =  c(a)  +  h(s')  —  h(s). 
In  this  way,  the  search  node  information  equals  the  /-value  of  the  states  belonging  to  the 
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nodes.  Again,  nodes  on  the  frontier  are  sorted  ascendingly.  We  call  a  particular  imple¬ 
mentation  of  this  algorithm  where  all  nodes  with  identical  /-value  on  the  frontier  queue 
are  merged  for  fSetA*.  An  A*  implementation  with  cycle  detection  must  keep  track  of 
g  and  h  separately  and  prune  child  states  reached  previously  with  a  lower  (/-value.  Thus, 
I0  =  (0,  h(s0))  and  5I(s,  a,  s')  =  (c(a),  h(s')  —  h(s)).  The  frontier  is,  as  usual,  sorted 
according  with  respect  to  the  evaluation  function  /(n)  =  g(n)  +  h(n).  The  resulting  al¬ 
gorithm  is  called  GhSetA*.  Compared  to  fSetA*,  GhSetA*  does  not  merge  nodes  that 
have  identical  /-value  but  different  g  and  /(-values.  In  each  iteration,  it  may  therefore  only 
expand  a  subset  of  the  states  on  the  frontier  with  minimum  /-value.  A  number  of  other 
improvements  have  been  integrated  in  ghSetA*.  First,  it  uses  a  tie  breaking  rule  for  nodes 
with  identical  /-value  that  chooses  the  node  with  the  least  /i- value.  Thus,  in  situations 
where  all  nodes  on  the  frontier  have  /(n)  =  C*,  the  algorithm  focuses  the  search  in  a 
DFS  fashion.  The  reason  is  that  a  node  at  depth  level  d  in  this  situation  must  have  greater 
/i-value  than  a  node  at  level  d  +  1  due  to  the  non- negative  transition  costs.  In  addition,  it 
merges  two  nodes  on  the  frontier  only  if  the  space  used  by  the  resulting  node  is  less  than 
an  upper-bound  u.  This  may  help  to  focus  the  search  further  in  situations  where  the  space 
requirements  of  the  frontier  nodes  grow  fast  with  the  search  depth.  Both  GhSetA*  and 
fSetA*  can  easily  be  extended  to  the  weighted  A*  algorithms  described  in  Section  2.4. 
Using  an  approach  similar  to  Pearl  [130],  FSetA*  and  GhSetA*  can  be  shown  to  be  opti¬ 
mal  given  an  admissible  heuristic.  In  particular  this  is  true  when  using  the  trivial  admissible 
heuristic  function  h(n)  =  0  of  uniform  cost  search. 

Lemma  4.3  Assume  fSetA*  and  GhSetA*  apply  an  admissible  heuristic  and  qo  ■  ■  •  qn  is 
the  path  associated  with  an  optimal  solution  n  =  a\  •  •  •  an,  then  at  any  time  before  FSetA* 
and  GhSetA*  terminate  there  exists  a  frontier  node  {.S',  I)  with  qt  G  S'  such  that  I  <  C* 
and  (/o'-  -  Qi  is  the  search  path  associated  with  q,. 

Proof  A  node  {S,  I )  containing  qt  with  associated  search  path  qo  ■  ■  ■  qt  must  be  on  the 
frontier  since  a  node  containing  s0  was  initially  inserted  on  the  frontier  and  fSetA*  and 
GhSetA*  terminate  if  a  node  containing  the  goal  state  sn  is  removed  from  the  frontier.  We 
have  /  =  cost  {a i  •  •  •  af)  +  h^qf).  The  path  q0  •  •  •  is  a  prefix  of  an  optimal  solution,  thus 
cost{ai  •  •  -  af)  must  be  the  minimum  cost  of  reaching  q *.  Since  the  heuristic  function  is 
admissible,  we  have  h(qi )  <  h*(qi)  which  gives  /  <  C*.  □ 


Theorem  4.2  (Optimality  of  fSetA*  and  ghSetA*)  Given  an  admissible  heuristic  func¬ 
tion,  FSetA*  and  GhSetA*  are  optimal. 

Proof  Suppose  fSetA*  or  GhSetA*  terminates  with  a  solution  derived  from  a  frontier 
node  with  I  >  C*.  Since  the  node  was  at  the  top  of  the  frontier  queue,  we  have 

I  <  f(n)  Vn  G  frontier. 
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Thus,  prior  to  termination,  all  nodes  on  the  frontier  satisfied  f(n)  >  C*.  However,  this 
contradicts  Lemma  4.3  that  states  that  any  optimal  path  has  a  node  on  the  frontier  any  time 
prior  to  termination  with  I  <  C* .  □ 

IDA*  performs  a  depth-first  search  in  the  search  tree  bounded  by  a  limit  fumu  on  the 
/-values  of  search  nodes.  Initially,  fumit  is  equal  to  the  /-value  of  the  initial  state.  In  each 
iteration,  fimnt  is  increased  by  the  minimum  value  that  the  previous  search  exceeded  fnmit 
by.  A  similar  algorithm  can  be  defined  for  a  state-set  search  structure  where  child  nodes 
with  identical  /-values  are  combined. 


4.2  BDD-Based  Implementation 

The  motivation  for  defining  the  BSFS  algorithm  is  that  it  can  be  efficiently  implemented 
with  BDDs.  In  this  section,  we  define  a  new  BDD  technique  called  branching  partitioning 
to  effectively  expand  search  nodes  where  the  sets  of  states  are  represented  by  BDDs. 

The  BDD-based  BSFS  algorithm  represents  the  states  in  each  search  node  by  a  BDD. 
This  may  lead  to  exponential  space  savings  compared  to  the  explicit  state  representation 
used  by  ordinary  implementations  of  best-first  search.  However,  if  we  want  exponential 
space  savings  to  translate  into  an  exponential  time  savings,  we  also  need  an  implicit  ap¬ 
proach  for  computing  the  expand  operation.  The  image  computation  can  be  applied  to  find 
all  next  states  of  a  set  of  states  implicitly,  but  we  need  a  way  to  partition  the  next  states  into 
child  nodes  with  identical  node  information.  The  expand  operation  could  be  carried  out  in 
two  phases,  where  the  first  finds  all  the  next  states  using  the  image  computation,  and  the 
second  splits  this  set  of  states  into  child  nodes  [179].  A  more  efficient  approach,  however, 
is  to  split  up  the  image  computation  such  that  the  second  phase  is  integrated  in  the  first 
phase  without  a  significant  computational  overhead.  We  call  this  branching  partitioning. 

4.2.1  Disjunctive  Branching  Partitioning 

For  disjunctive  partitioning  the  approach  is  straight-forward.  We  simply  ensure  that  each 
partition  contains  transitions  with  the  same  search  information  change.  The  result  is  called 
a  disjunctive  branching  partitioning. 

Definition  4.1  (Disjunctive  Branching  Partitioning)  A  disjunctive  branching  partition¬ 
ing  is  a  disjunctive  partitioning  Ri(xi,y[),  •  •  • ,  Rn(xn,y'n )  where  each  subrelation  rep¬ 
resents  a  set  of  transitions  with  the  same  search  node  information  change. 
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Notice,  that  there  may  exist  several  partitions  with  identical  information  change.  In  prac¬ 
tice,  it  is  often  more  efficient  to  merge  some  of  these  partitions  even  though  more  variables 
will  be  modified  by  the  resulting  partitions. 

So  far,  an  unresolved  problem  is  how  to  find  the  search  node  information  change  of 
each  transition  efficiently.  It  is  intractable  to  compute  h(s)  explicitly  for  each  state  since 
the  number  of  states  grows  exponentially  with  the  number  of  state  variables  of  the  domain. 
In  practice,  however,  it  turns  out  that  5 h  of  an  action  often  is  independent  of  which  state 
it  is  applied  in.  This  is  not  a  coincidence.  Heuristics  are  relaxations  that  typically  are 
based  on  ignoring  interactions  between  actions  in  the  domain.  Thus,  the  effect  of  an  action 
can  often  be  associated  with  a  particular  5h  value.  In  the  worst  case,  it  may  be  necessary  to 
encode  the  heuristic  function  symbolically  with  a  BDD  h(e.  v)  where  the  vector  of  Boolean 
variables  e  encodes  the  heuristic  value  in  binary  of  the  state  represented  by  v.  We  can  then 
compute  Sh(s ,  s')  symbolically  with 

5h(v,  v',  d)  =  h(e,  v)  A  h{e\ v ')  A  d  =  e‘ '  -  e  (4.1) 

— t 

where  d  encodes  the  value  of  Sh(s,  s')  in  binary.  This  computation  avoids  iterating  over 
all  states.  In  addition,  it  only  needs  to  be  carried  out  once  prior  to  search.  For  all  of 
the  heuristics  studied  in  this  thesis  (including  several  classical  heuristics),  it  has  not  been 
necessary  to  perform  this  symbolic  computation.  Instead,  the  Sh  value  of  each  action  has 
been  independent  or  close  to  independent  of  the  state  the  action  is  applied  in. 

Example  4.3  For  the  search  problem  in  Example  4.1,  we  get  at  least  three  subrelations 
corresponding  to  the  three  distinct  d/- values 

Sfi  =  0 


Ri(v,v') 

=  —<Vi  A  —<V2 

A 

v[  A  -iv'2 

->Vi  A  V2 

A 

—,v'i  A  - w'2 

-iVl  A  V2 

A 

v[  A  v'2 

5/2 

=  2 

R2(V,V') 

=  Vi  A  V2 

A 

->v[  A  v'2 

5/3 

=  3 

Rs(v,v') 

=  Vi  A  -1^2 

A 

v[  A  v'2. 

0 

Assume  that  the  search  node  information  change  associated  with  subrelation  i  is  SI.l 
and  that  there  are  n  subrelations.  Let  iMGj(C)  denote  the  image  of  the  transitions  in  sub- 
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relation  i 


IMG i(C)  =  (aj/i  ■  C(v)  A  Ri(X',  y  ■))  [y'ilVil  (4-2) 

The  StateSetExpand  function  in  Figure  4.3  can  then  be  implemented  with  BDDs  as 
shown  in  Figure  4.5. 

function  DisjunctiveStateSetExpand((5,  /)) 

1  child  A —  emptyMap 

2  for  *  =  1  to  n 

4  Ic<-  I  +  5Ii 

5  child [7C]  A-  child [/c]  U  iMGj(S) 

6  return  MakeNodes  (child) 


Figure  4.5:  The  StateSetExpand  function  for  a  disjunctive  branching  partition¬ 
ing. 


4.2.2  Conjunctive  Branching  Partitioning 

An  efficient  implicit  node  expansion  computation  is  also  possible  to  define  for  a  conjunc¬ 
tive  partitioning.  Consider  the  synchronous  composition  of  the  n  subsystems  in  Figure  2.6. 
Assume  that  the  search  node  information  change  of  a  joint  activity  equals  the  sum  of  infor¬ 
mation  changes  of  each  activity.  We  can  then  represent  a  conjunctive  branching  partitioning 
as  n  disjunctive  branching  partitionings  where  each  disjunctive  branching  partitioning  rep¬ 
resents  the  subrelations  of  the  activities. 

Definition  4.2  (Conjunctive  Branching  Partitioning)  A  conjunctive  branching  partition¬ 
ing  Pi,  ■  ■  ■ ,  Pn  is  a  set  of  disjunctive  branching  partitionings 

P,  =  RHx„y'l),---,Rnx„y[) 


for  1  <  i  <  n. 

Since  the  subsystems  are  synchronous,  we  require  that  the  sets  of  variables  in  $[,•••  ,y'n 
form  a  partitioning  of  the  state  variables  v' .  Assume  that  the  search  node  information 
change  of  R\ (a?*,  y  ■)  is  51- .  Further  let 


SubComp^)  =  3zi .  f(v,  v')  A  R{{xi,  y\) 


(4.3) 
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where  <p  represents  an  intermediate  computation  result.  As  for  an  ordinary  conjunctive 
image  computation,  we  require  Z3  n  U”  -+1  ^  =  0  for  1  <  j  <  n  and  (J”=1  Z. \  = 

The  conjunctive  state-set  expansion  function  is  then  defined  as  shown  in  Figure  4.6.  The 

function  ConjunctiveStateSetExpand((S',  /)) 

1  child  4—  emptyMap 

2  child[7]  4-  S 

3  for  i  =  1  to  n 

4  newChild  4—  emptyMap 

5  foreach  entry  (</>,  SI)  in  child 

6  for  j  =  1  to  fj 

7  Ic  4- 51  +  511 

8  newChild[/c]  4—  newChild[/c]  V  SubComp^ ((f)) 

9  child  4—  newChild 

10  return  MakeNodes  (child) 


Figure  4.6:  The  StateSetExpand  function  for  a  conjunctive  branching  partition¬ 
ing. 


outer  loop  of  the  conjunctive  state-set  expansion  function  performs  n  iterations.  In  iteration 
i,  the  next  value  of  the  variables  yt  is  computed.  In  the  end,  the  map  child  contains  sets 
of  next  states  with  identical  search  node  information. 1  In  the  worst  case,  the  number  of 
child  nodes  will  grow  exponentially  with  the  number  of  activities.  However,  in  practice 
this  blow-up  of  child  nodes  may  be  avoided  due  to  the  merging  of  nodes  with  identical 
search  node  information  during  the  computation. 


4.3  Experimental  Evaluation 

Even  though  state- set  branching  applies  to  weighted  A*  and  pure  heuristic  search,  the  ex¬ 
perimental  evaluation  focuses  on  evaluating  the  two  implementations  fSetA*  and  GH- 
SetA*  of  the  A*  algorithm.  There  are  several  reasons  for  this.  First,  we  are  interested 
in  finding  optimal  or  near  optimal  solutions,  and  for  pure  heuristic  search,  the  whole  em¬ 
phasis  would  be  on  the  quality  of  the  heuristic  function  rather  than  the  efficiency  of  the 
search  approach.  Second,  the  behavior  of  A*  has  been  extensively  studied,  and  finally,  we 

'The  function  MakeNodes  generates  search  nodes  from  the  map.  In  addition,  it  substitutes  the  variables 
of  the  BDDs  encoding  next  states  from  primed  to  unprimed  state  variables. 
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wish  to  compare  with  the  BDDA*  algorithm.  Readers  interested  in  the  performance  of 
state-set  branching  algorithms  of  weighted  A*  with  weight  settings  other  than  w  =  0.5  (see 
Equation  2.17)  are  referred  to  [89]. 

All  experiments  have  been  carried  out  with  the  BIFROST  0.7  search  engine  using  the 
experimental  setting  described  in  Appendix  A.  The  input  to  BIFROST  is  a  search  prob¬ 
lem  defined  in  the  STRIPS  part  of  PDDF  or  NADF+  described  in  Appendix  A  where 
action  costs  and  heuristics  can  be  defined.  The  performance  of  6  algorithms  GhSetA*, 
fSetA*,  Bidir,  BDDA*,  and  iBDDA*  is  investigated.  The  GhSetA*,  fSetA*,  and 
Bidir  search  algorithms  have  been  described  in  Section  4.1  and  Section  3.1.2.  The  A* 
algorithm  manipulates  and  represents  states  explicitly.  Due  to  the  different  state  represen¬ 
tations,  specialized  versions  have  been  made  for  the  (n2  —  1) -Puzzles,  the  DVM  domain, 
and  the  FGk  domain  described  below.  In  addition,  a  general  version  for  PDDF  planning 
is  implemented  in  BIFROST  0.7  and  represents  states  as  sets  of  facts  and  actions  in  the 
usual  STRIPS  fashion.  All  of  the  single-state  A*  algorithms  are  implemented  with  cycle 
detection.  The  BDDA*  algorithm  has  been  implemented  in  BIFROST  0.7  as  described 
in  [53].  It  is  shown  in  Figure  4.7.  BDDA*  can  solve  search  problems  only  in  domains 

function  BDDA*(so,  G) 

1  open(f,  v)  <-  h(f,  v )  A  s0(v) 

2  while  ( open  ^  0) 

— # 

3  {f mim  min(v),  open'(f ,  v))  G-  GoFeft (open) 

4  if  3v .  min(v)  A  G(v)  return  fmin 

5  open"(f\  v')  <—  3v  .  min(v )  A  T(v,  v')A 

6  3e.  h(e,  v)  A  3e' .  h(e',  v')  A  (/'  =  fmtn  +  e '  -  e  +  1) 

7  open(f,  v)  G-  open'(f,  v)  V  open"(f',  v')[f'  \  f,  v 1  \  v\ 

Figure  4.7:  The  BDDA*  algorithm. 

— # 

unit  transition  costs.  The  search  frontier  is  represented  by  a  single  BDD  open(f ,  v). 
BDD  is  the  characteristic  function  of  a  set  of  states  paired  with  their  /-value.  The 
is  encoded  as  usual  by  a  Boolean  vector  v  and  the  /-value  is  encoded  in  binary  by  the 
Boolean  vector  f.  Similarly  to  FSetA*,  BDDA*  expands  all  states  min(v )  with  minimum 
/-value  in  each  iteration.  The  /-value  of  the  child  states  is  computed  by  arithmetic 

operations  at  the  BDD  level  (line  5  and  6).  The  change  in  h- value  is  found  by  applying  a 
symbolic  encoding  of  the  heuristic  function  to  the  child  and  parent  state.  BDDA*  is  able 
to  find  optimal  solutions,  but  the  algorithm  only  returns  the  path  cost  of  such  solutions.  In 
our  implementation,  we  therefore  added  a  function  for  tracing  a  solution  backward.  In  the 
domains,  we  have  investigated,  this  extraction  function  has  low  complexity,  as  do  those  for 


with 

This 

state 
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GhSetA*  and  fSetA*.  Our  investigation  of  the  BDDA*  algorithm  shows  that  it  often  can 
be  improved  by 

1.  defining  a  computation  of  open"  using  a  disjunctive  partitioned  transition  relation 
instead  of  monolithic  transition  relation  as  in  line  5, 

2.  precomputing  the  arithmetic  operation  at  the  end  of  line  6  for  each  possible  /-value, 

— # 

3.  interleaving  the  BDD  variables  of  /,  e,  and  e '  to  improve  the  arithmetic  BDD  oper¬ 
ations,  and 

4.  moving  this  block  of  variables  to  the  middle  of  the  BDD  variable  ordering  to  reduce 
the  average  distance  to  dependent  state  variables. 

The  last  improvement  is  actually  antagonistic  to  the  recommendation  of  the  BDDA*  in- 

— # 

ventors  who  locate  the  /  variables  at  the  beginning  of  the  variable  ordering  to  simplify 
the  GoLeft  operation.  However,  we  get  up  to  a  factor  of  two  speed  up  with  the  four 
modifications  above.  The  algorithms  are  summarized  in  the  table  below. 


ghSetA*  : 

The  GhSetA*  algorithm  with  evaluation  function  f(n)  = 
g{n)  +  h(n ). 

fSetA*  : 

The  fSetA*  algorithm  with  evaluation  function  f(n )  = 
g(n) +h(n) .  This  algorithm  has  been  implemented  to  mimic 
the  BDDA*  algorithm.  It  expands  exactly  the  same  states  in 
each  iteration.  Any  performance  difference  between  the  two 
algorithms  is  due  to  efficiency  differences  between  state-set 
branching  and  the  approach  used  by  BDDA*.2 

Bidir  : 

The  BDD-based  blind  breadth-first  bidirectional  search  al¬ 
gorithm  shown  in  Figure  3.4. 

A*  : 

Single- state  A*  with  cycle  detection,  explicit  state  manipu¬ 
lation,  and  evaluation  function  /(n)  =  g(n)  +  h(n). 

BDDA*  : 

The  BDDA*  algorithm  [53]  shown  in  Figure  4.7. 

iBDDA*  : 

An  improved  version  of  BDDA*  described  below. 

In  order  to  factor  out  differences  due  to  state  encodings  and  BDD  computations,  all 
BDD-based  algorithms  use  the  same  bit  vector  representation  of  states,  the  same  variable 
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ordering  of  the  state  variables,  and  similar  space  allocation  and  cache  sizes  of  the  BDD 
package.  We  believe  we  did  an  extensive  empirical  validation.  It  is  necessary  since  a 
dissimilarity  in  just  one  of  the  above  mentioned  properties  may  cause  an  exponential  per¬ 
formance  difference.  All  algorithms  share  as  many  subcomputations  as  possible,  but  redun¬ 
dant  or  unnecessary  computations  are  never  carried  out  for  a  particular  instantiation  of  an 
algorithm.  The  following  table  shows  the  measured  performance  parameters  of  BIFROST. 

ttotai  :  The  total  elapsed  CPU  time  of  BIFROST. 

trei  :  Time  to  generate  the  transition  relation.  For  BDDA*  and 

iBDDA*,  this  also  includes  building  the  symbolic  repre¬ 
sentation  of  the  heuristic  function  and  /-formulas. 

tsearch  '■  Time  to  search  for  and  extract  a  solution. 

|so/|  :  Solution  length. 

j  expand\  :  For  Bidir  this  is  the  average  size  of  the  BDDs  representing 
the  search  frontier.  For  fSetA*  and  GhSetA*,  it  is  the 
average  size  of  BDDs  of  search  nodes  being  expanded.  For 
BDDA*  and  iBDDA*,  it  is  the  average  size  of  open” . 

\maxQ\  :  Maximum  number  of  queue  nodes  on  the  frontier  queue. 

|T|  :  The  sum  of  the  BDDs  representing  the  partitioned  transition 

relation. 

it  :  Number  of  iterations  of  the  algorithm. 

Time  is  measured  in  seconds.  The  time  ttotai  —  trei  —  tsearch  is  spent  on  allocating  memory 
for  the  BDD  package,  parsing  the  problem  description  and,  in  case  of  PDDL  problems, 
analysing  the  problem  in  order  to  make  a  compact  Boolean  state  encoding.  Time  out  and 
out  of  memory  are  indicated  by  Time  and  Mem.  Time  out  changes  between  the  experi¬ 
ments.  The  algorithms  are  out  of  memory  when  they  start  page  faulting  to  the  hard  drive  at 
approximately  450  MB  RAM. 

Our  experiments  cover  a  wide  range  of  search  domains  and  heuristics.  The  first  domain 
FGk  is  artificial  and  uses  the  minimum  Hamming  distance  as  heuristic  function.  It  demon¬ 
strates  that  state-set  branching  may  have  exponentially  better  performance  than  single-state 
A*.  Next,  we  consider  the  DxVyMz  Puzzle  and  the  24  and  35  Puzzle  using  minimum 
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Hamming  distance  and  sum  of  Manhattan  distance  as  heuristic  function,  respectively.  We 
then  consider  a  number  of  STRIPS  planning  problems  from  the  AIPS  planning  competi¬ 
tions  [113,  4,  115]  using  the  HSPr  heuristic  [20]  and  finally,  we  study  the  channel  routing 
problem  from  VLSI  design  using  a  specialized  heuristic  function. 

4.3.1  Search  Problems 

FGk 

This  problem  is  a  modification  of  Barret  and  Weld’s  D1S 1  problem  [8]  and  has  been  con¬ 
structed  to  show  that  state-set  branching  may  have  exponentially  better  performance  than 
single-state  A*.  The  problem  is  easiest  to  describe  in  STRIPS.  Thus,  a  state  is  a  set  of  facts 
and  actions  are  fact  triples  defining  sets  of  transitions.  The  actions  are 


A* 

A1 

i  =  2,  ■  •  • ,  n 

A?, 

i  =  1,  •  ■  ■ ,  n 

pre 

{ F *} 

pre 

:  {F* ■,  Gi—i} 

pre 

■■  0 

add 

{Gi} 

add 

■  {G,} 

add 

: 

del 

{} 

del 

:  (} 

del 

:  {F*}. 

Each  action  is  assumed  to  have  unit  cost.  The  initial  state  is  {F*}  and  the  goal  state  is 
{Gi\k  <  %  <  n}.  Only  Aj  actions  should  be  applied  to  reach  the  goal.  Applying  an  A f 
action  in  any  state  leads  to  a  wild  path  since  F*  is  deleted.  The  states  on  wild  paths  contain 
Fi  facts.  Since  any  subset  of  F,,  facts  is  possible,  the  number  of  states  on  wild  paths  grows 
exponentially  with  n.  The  heuristic  function  is  the  minimum  Hamming  distance  to  the 
goal  states.  The  only  solution  is  Aj,  •  •  • ,  A*  and  is  non-trivial  to  find,  since  the  heuristic 
gives  no  information  to  guide  the  search  on  the  first  k  steps.  Intuitively,  the  problem  can 
be  thought  of  as  walking  blindfolded  on  a  sharp  ridge  for  k  steps  and  then  with  full  vision 
for  the  remaining  n  —  k  steps.  A  single  wrong  step  has  an  exponential  search  penalty  of 
exploring  wild  paths. 

In  this  experiment,  we  compare  only  the  total  CPU  time  and  number  of  iterations  of 
GhSetA*  and  single-state  A*.  The  FGA  problems  are  defined  in  NADL+.  A  specialized 
poly-time  BDD  operation  for  splitting  NADL+  actions  into  transitions  with  the  same  search 
information  change  is  used  for  GhSetA*.  No  upper  bound  (u  =  oo)  is  used  by  GhSetA* 
and  no  upper  limit  of  the  branching  partitions  is  applied.  For  the  FGA  problems  considered, 
n  equals  16.  This  corresponds  to  a  domain  with  233  states.  Time  out  is  600  seconds. 
The  results  are  shown  in  Figure  4.8.  The  performance  of  A*  degrades  quickly  with  the 
number  of  unguided  steps.  A*  gets  lost  expanding  an  exponentially  growing  set  of  states 
on  wild  paths.  The  GhSetA*  algorithm  is  hardly  affected  by  the  lack  of  guidance.  The 
reason  is  that  GhSetA*  degenerates  to  a  regular  BDD-based  blind  forward  search  on  the 
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Number  of  unguided  steps  (k) 

Figure  4.8:  Total  CPU  time  of  the  FGfc  problems. 

unguided  part  where  the  frontier  states  can  be  represented  by  a  near  symmetric  function 
with  polynomial  BDD  size.  Thus,  the  performance  difference  between  A*  and  GHSetA* 
grows  exponentially  with  k. 

DxVyMz 

This  problem  has  the  minimum  Hamming  distance  as  an  admissible  heuristic.  The  domain 
consists  of  a  set  of  sliders  that  can  be  moved  between  the  comer  positions  of  hypercubes. 
In  any  state,  a  corner  position  can  be  occupied  by  at  most  one  slider.  The  dimension  of 
the  hypercubes  is  y.  There  are  z  sliders  of  which  x  are  moving  on  the  same  cube.  The 
remaining  z  —  x  sliders  are  moving  on  individual  cubes.  The  sliders  are  numbered.  Initially, 
they  are  given  corner  positions  that,  when  encoded  in  binary,  correspond  to  an  ascending 
order  of  their  numbers.  The  goal  is  to  change  their  positions  to  a  descending  order.  Each 
action  is  assumed  to  have  unit  cost.  Figure  4.9  shows  the  initial  state  of  D5V3M7 . 

When  x  =  z  all  sliders  are  moving  on  the  same  cube.  If  further  x  =  2y  —  1  all  corners 
of  the  cube  except  one  will  be  occupied  making  it  a  permutation  problem  similar  to  the  8- 
Puzzle.  The  key  point  about  this  problem  is  that  the  x  parameter  allows  the  dependency  of 
sliders  to  be  adjusted  linearly  without  changing  the  size  of  the  domain.  For  the  BDD-based 
algorithms,  the  DXVAM 15  problems  are  defined  in  NADL+.  Again,  a  specialized  poly¬ 
time  BDD  operation  for  splitting  NADL+  actions  into  transitions  with  the  same  search 
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A 


o° 


Figure  4.9:  The  initial  state  of  D5V3M7 . 


information  change  is  applied  by  GhSetA*  and  fSetA*.  For  all  problems,  the  number  of 
states  is  260.  For  GhSetA*  the  upper  bound  for  node  merging  is  200  ( u  =  200).  All  BDD- 
based  algorithms  except  BD DA*  utilize  a  disjunctive  partitioning  with  an  upper  bound  on 
the  BDDs  representing  a  partition  of  5000.  Time  out  is  500  seconds.  For  all  problems, 
the  BDD-based  algorithms  use  2.3  seconds  on  initializing  the  BDD  package  (n  =  8 M  and 
c  =  700 K).  The  results  are  shown  in  Table  4.1.  Figure  4.10  shows  a  graph  of  the  total  CPU 
time  for  the  algorithms. 


Figure  4.10:  Total  CPU  time  of  the  DXV4M15  problems. 


All  solutions  found  are  34  steps  long.  Even  when  the  largest  number  of  sliders  are  on 
the  same  cube,  a  plan  with  the  minimum  34  steps  is  possible.  For  BDDA*  and  iBDDA* 
the  size  of  the  BDD  representing  the  heuristic  function  is  2014  and  1235,  respectively.  Both 
the  size  of  the  monolithic  and  partitioned  transition  relation  grows  fast  with  the  dependency 
of  sliders.  The  problem  is  that  there  is  no  efficient  way  to  model  whether  a  position  is  occu¬ 
pied  or  not.  The  most  efficient  algorithm  is  GhSetA*.  The  fSetA*  algorithm  has  worse 
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performance  than  GhSetA*  because  it  has  to  expand  all  states  with  minimum  /-value  in 
each  iteration,  whereas  GhSetA*  focus  on  a  subset  of  them  by  having  u  =  200.  A  subex¬ 
periment  shows  that  GhSetA*  has  similar  performance  as  FSetA*  when  setting  u  =  oo. 
The  impact  of  the  u  parameter  is  significant  for  this  problem  since,  even  for  fairly  large 
values  of  x,  it  has  an  abundance  of  optimal  solutions.  BDDA*  has  much  worse  perfor¬ 
mance  than  fSetA*  even  though  it  expands  the  exact  same  set  of  states  in  each  iteration. 
As  we  show  in  Section  4.4,  the  problem  is  that  the  complexity  of  the  computation  of  open" 
grows  fast  with  the  size  of  the  BDD  representing  the  states  to  expand.  Surprisingly,  the 
performance  of  iBDDA*  is  worse  than  BDDA*.  This  is  unusual,  as  the  remaining  exper¬ 
iments  will  show.  The  reason  might  be  that  only  a  little  space  is  saved  by  partitioning  the 
transition  relation  in  this  domain.  This  may  cause  the  computation  of  open"  for  iBDDA* 
to  deteriorate  because  it  must  iterate  through  all  the  partitions.  A*  performs  well  when 
f{n )  is  a  perfect  or  near  perfect  discriminator,  but  it  soon  gets  lost  in  keeping  track  of  the 
fast  growing  number  of  states  on  optimal  paths.  It  times  out  in  a  single  step  going  from 
about  one  second  to  more  than  500  seconds.  The  problem  for  Bidir  is  the  usual  for  blind 
BDD-based  search  algorithms  applied  to  hard  combinatorial  problems:  the  BDDs  repre¬ 
senting  the  search  frontiers  blow  up  which  increases  the  time  of  the  image  and  preimage 
computations  dramatically. 

The  24  and  35-Puzzle 

We  now  turn  to  investigating  the  (n2  —  1)-Puzzles.  The  domain  consists  of  an  n  x  n  board 
with  n2  —  1  numbered  tiles  and  a  blank  space.  A  tile  adjacent  to  the  blank  space  can  slide 
into  the  space.  The  goal  is  to  reach  a  configuration  where  the  tiles  are  ordered  ascendingly 
as  shown  for  the  24-Puzzle  in  Figure  4.11.  For  our  experiments,  the  initial  state  is  gener- 
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Figure  4.11:  Goal  state  of  the  24-Puzzle. 
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Algorithm 

X 

t total 

trel 

t  search 

expand\ 

\Q\  max 

T 

it 

GhSetA* 

1 

2.7 

0.3 

0.2 

307.3 

33 

710 

34 

2 

2.8 

0.3 

0.2 

307.3 

33 

1472 

34 

3 

3.1 

0.4 

0.3 

671.0 

33 

4070 

34 

4 

3.2 

0.5 

0.4 

441.7 

72 

10292 

34 

5 

3.1 

0.4 

0.4 

194.8 

120 

20974 

34 

6 

3.3 

0.6 

0.4 

139.9 

212 

45978 

34 

7 

3.9 

1.0 

0.5 

128.4 

322 

104358 

34 

8 

4.9 

1.9 

0.6 

115.9 

438 

232278 

34 

9 

8.1 

5.0 

0.8 

132.0 

557 

705956 

34 

10 

29.5 

14.3 

12.8 

146.1 

5103 

1970406 

373 

11 

46.9 

43.8 

0.8 

107.3 

336 

5537402 

34 

12 

Mem 

fSetA* 

1 

2.7 

0.3 

0.2 

307.3 

1 

710 

34 

2 

2.8 

0.3 

0.2 

307.3 

1 

1472 

34 

3 

3.1 

0.4 

0.4 

671.0 

1 

4070 

34 

4 

3.3 

0.4 

0.6 

671.0 

1 

10292 

34 

5 

5.1 

0.5 

2.3 

1778.6 

1 

20974 

34 

6 

9.6 

0.6 

6.6 

2976.5 

1 

45978 

34 

7 

37.5 

1.0 

34.2 

9046.7 

1 

104358 

34 

8 

63.4 

2.0 

59.1 

9046.7 

1 

232278 

34 

9 

408.3 

4.9 

401.1 

24175.4 

1 

705956 

34 

10 

Time 

BDDA* 

1 

3.6 

0.5 

0.4 

314.3 

355 

34 

2 

3.9 

0.5 

0.6 

314.3 

772 

34 

3 

4.6 

0.6 

1.3 

678.0 

2128 

34 

4 

5.5 

0.8 

2.0 

678.0 

6484 

34 

5 

10.2 

1.3 

6.2 

1785.6 

20050 

34 

6 

56.4 

3.4 

50.4 

2983.5 

64959 

34 

7 

214.8 

10.8 

201.1 

9053.7 

234757 

34 

8 

312.1 

52.7 

256.1 

9053.7 

998346 

34 

9 

Time 

iBDDA* 

1 

4.0 

0.4 

0.8 

307.3 

355 

34 

2 

4.2 

0.4 

1.1 

307.3 

772 

34 

3 

5.1 

0.5 

1.9 

671.0 

2128 

34 

4 

6.2 

0.4 

3.0 

671.0 

6791 

34 

5 

33.7 

0.4 

30.4 

1778.6 

25298 

34 

6 

117.6 

0.5 

113.9 

2976.5 

84559 

34 

7 

Time 

A* 

1 

1.1 

1884 

34 

2 

1.1 

1882 

34 

3 

1.0 

1770 

34 

4 

1.0 

1750 

34 

5 

0.9 

1626 

34 

6 

Time 

Bidir 

1 

2.7 

0.2 

0.1 

568.5 

355 

34 

2 

2.7 

0.2 

0.2 

630.8 

772 

34 

3 

3.2 

0.3 

0.7 

2305.1 

2128 

34 

4 

5.2 

0.2 

2.6 

3131.1 

5159 

34 

5 

278.9 

0.2 

276.4 

30445.0 

10610 

34 

6 

Time 

Table  4.1:  Results  of  the  DXV4M15  problems. 
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ated  by  performing  r  random  moves  from  the  goal  state.3  We  assume  unit  cost  transitions 
and  use  the  usual  sum  of  Manhattan  distances  of  the  tiles  to  their  goal  position  as  heuristic 
function.  This  heuristic  function  is  admissible.  For  GhSetA*  and  fSetA*  a  disjunctive 
branching  partitioning  is  easy  to  compute  since  5 h  of  an  action  changing  the  position  of 
a  single  tile  is  independent  of  the  position  of  the  other  tiles.  The  two  algorithms  have  no 
upper  bound  on  the  size  of  BDDs  in  the  frontier  nodes  (u  =  oo).  For  the  BDD-based  algo¬ 
rithms,  the  problems  are  defined  in  NADL+  and  the  best  results  are  obtained  when  having 
no  limit  on  the  partition  size.  Thus,  BDDA*,  iBDDA*,  and  Bidir  use  a  monolithic  tran¬ 
sition  relation.  The  number  of  states  for  the  24-Puzzle  is  212°.  The  results  of  this  problem 
are  shown  in  Table  4.2.  For  all  24-Puzzle  problems,  the  BDD-based  algorithms  spend  3.6 
seconds  on  initializing  the  BDD  package  (n  =  15M  and  c  =  500A’).  Time  out  is  10000 
seconds.  For  BDDA*  and  iBDDA*  the  size  of  the  BDD  representing  the  heuristic  func¬ 
tion  is  33522  and  18424,  respectively.  For  GhSetA*  and  fSetA*  the  size  of  the  transition 
relations  is  70582,  while  the  size  of  the  transition  relation  for  BDDA*  and  iBDDA*  is 
66673.  Thus,  a  small  amount  of  space  was  saved  by  using  a  monolithic  transition  relation 
representation.  However,  GhSetA*  and  FSetA*  have  better  performance  than  BDDA* 
and  iBDDA*  mostly  due  to  the  their  more  efficient  node  expansion  computation.  Interest¬ 
ingly,  both  BDDA*  and  iBDDA*  spend  significant  time  computing  the  heuristic  function 
in  this  domain.  The  GhSetA*  and  fSetA*  also  scale  better  than  A*  and  Bidir.  A*  has 
good  performance  because  it  does  not  have  the  substantial  overhead  of  computing  the  tran¬ 
sition  relation  and  finding  actions  to  apply.  However,  due  to  the  explicit  representation  of 
states,  it  runs  out  of  memory  for  solution  depths  above  50.  For  Bidir,  the  problem  is  the 
usual:  the  BDDs  representing  the  search  frontiers  blow  up.  Figure  4.12  shows  a  graph  of 
the  total  CPU  time  of  the  24  and  35-puzzle.  Again  time  out  is  10000  seconds. 

4.3.2  Planning  Problems 

In  this  section,  we  consider  four  planning  problems  from  the  STRIPS  track  of  the  AIPS 
1998  [113],  2000  [4],  and  2002  [115]  planning  competition.  The  problems  are  defined  in 
the  STRIPS  fraction  of  PDDL.  The  reachability  analysis  necessary  to  compactly  encode 
STRIPS  domains  described  in  Section  3.1  is  based  an  approach  described  in  [48].  It  is 
fast  for  the  problems  considered  in  experimental  evaluation  (for  most  problems  less  than 
0.04  seconds).  The  algorithm  proceeds  in  a  breadth-first  manner  such  that  each  ground 
predicate  or  fact  f  can  be  assigned  a  depth  d(  f  )  where  it  is  reached.  Similar  to  the  MIPS 
planning  system  [48],  we  use  this  measure  to  approximate  the  HSPr  heuristic  [20].  HSPr  is 
an  efficient  but  non-admissible  heuristic  for  backward  search.  For  a  state  given  by  a  set  of 

3In  each  of  these  steps  choosing  the  move  back  to  the  previous  state  is  illegal. 
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Algorithm 

r 

t total 

trel 

t  search 

so/| 

expand  \ 

\Q\max 

it 

ghSetA* 

140 

28.8 

22.1 

2.7 

26 

187.5 

23 

93 

160 

30.0 

22.2 

3.8 

28 

213.2 

24 

175 

180 

31.4 

22.2 

5.3 

32 

270.2 

28 

253 

200 

43.7 

21.9 

14.9 

36 

786.2 

31 

575 

220 

36.3 

22.2 

10.1 

36 

411.1 

31 

490 

240 

199.3 

22.0 

173.2 

50 

2055.5 

44 

1543 

260 

5673.7 

23.9 

5644.5 

56 

10641.2 

48 

2576 

280 

Mem 

300 

4772.7 

20.9 

4743.97 

60 

9761.3 

53 

2705 

320 

Mem 

fSetA* 

140 

29.7 

21.0 

4.7 

26 

669.9 

1 

42 

160 

32.2 

20.9 

7.4 

28 

1051.6 

1 

57 

180 

34.3 

21.0 

9.5 

32 

1207.0 

1 

69 

200 

50.1 

21.0 

25.3 

36 

5276.0 

1 

93 

220 

41.8 

21.0 

17.0 

36 

3117.6 

1 

88 

240 

205.2 

21.0 

180.5 

50 

18243.3 

1 

156 

260 

Mem 

BDDA* 

140 

98.5 

83.0 

11.3 

26 

676.9 

42 

160 

114.7 

83.2 

27.4 

28 

1058.6 

57 

180 

129.8 

82.9 

42.7 

32 

1214.0 

69 

200 

425.0 

83.1 

337.1 

36 

5283.0 

93 

220 

267.7 

82.8 

180.6 

36 

3124.6 

88 

240 

4120.1 

83.1 

4032.8 

50 

18250.3 

156 

260 

Time 

iBDDA* 

140 

79.8 

66.7 

5.9 

26 

669.9 

42 

160 

85.3 

65.7 

11.8 

28 

1051.6 

57 

180 

93.6 

65.7 

20.0 

32 

1207.0 

69 

200 

314.6 

65.8 

240.9 

36 

5276.0 

93 

220 

156.9 

65.6 

83.5 

36 

3117.6 

88 

240 

2150.3 

65.9 

2076.6 

50 

18243.3 

156 

260 

Mem 

A* 

140 

0.1 

26 

300 

221 

160 

0.9 

28 

725 

546 

180 

0.6 

32 

1470 

1106 

200 

7.4 

36 

15927 

12539 

220 

2.3 

36 

5228 

4147 

240 

87.1 

50 

159231 

133418 

260 

Mem 

Bidir 

140 

68.1 

36.6 

27.9 

26 

34365.2 

26 

160 

96.0 

36.8 

55.6 

28 

55388.4 

28 

180 

214.7 

36.8 

174.3 

32 

106166.0 

32 

200 

1286.0 

36.8 

1245.6 

36 

359488.0 

36 

220 

3168.8 

36.8 

3128.4 

36 

421307.0 

36 

240 

Mem 

Table  4.2:  Results  of  the  24-Puzzle  problems. 


4.3.  EXPERIMENTAL  EVALUATION 


24-puzzle 


35-puzzle 


Figure  4.12:  Total  CPU  time  for  the  24  and  35-Puzzle  problems 


facts  S,  the  approximation  to  HSPr  is  given  by 

A(S)  =  $>(/)• 
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4  A  disjunctive  branching  partitioning  for  this  heuristic  is  efficient  to  generate  given  that 
each  action  ( pre ,  add ,  del)  leading  from  S  to  S'  =  (S  U  add )  \  del  satisfies 

del  C  pre  and  add  fl  pre  =  0. 

These  requirements  are  natural  and  satisfied  by  all  the  planning  domains  considered  in  this 
experimental  evaluation.  Due  to  the  constraints,  we  get 

Sh  =  h(S')-h{S) 

=  h(add  \S)  —  h(del) 

=  E  <v>  -  E  <*(/)• 

f£add\S  fedel 

Thus,  each  action  is  partitioned  in  up  to  2'add\  sets  of  transitions  with  different  <5 /i- value. 
In  order  to  simplify  the  computation  of  the  initial  heuristic  value,  all  problems  have  been 
modified  to  a  single  goal  state.  Furthermore,  in  domains  where  the  HSPr  approximation 
either  systematically  under  or  over  estimates  the  true  remaining  cost,  we  have  scaled  it 
accordingly. 

Blocks  World 

The  Blocks  World  is  a  classical  planning  domain.  It  consists  of  a  set  of  cubic  blocks  sitting 
on  a  table.  A  robot  arm  can  stack  and  unstack  blocks  from  some  initial  configuration  to 
a  goal  configuration.  The  problems,  we  consider,  are  from  the  STRIPS  track  of  the  AIPS 
2000  planning  competition.  The  number  of  states  grows  from  217  to  280.  The  HSPr  heuristic 
is  scaled  by  a  factor  of  0.4.  The  GhSetA*  and  FSetA*  algorithms  have  no  upper  bound 
on  the  size  of  BDDs  of  the  nodes  on  the  frontier  (u  =  oo).  For  all  BDD-based  algorithms, 
the  partition  limit  was  5000.  For  each  problem,  these  algorithms  spend  about  2.5  seconds 
on  initializing  the  BDD  package  (n  =  8 M  and  c  =  800 A’)-  Time  out  is  500  seconds  in  all 
experiments.  The  results  are  shown  in  Table  4.3.  The  top  graph  of  Figure  4.13  shows  the 
total  CPU  time  of  the  algorithms. 

For  BDDA*  and  iBDDA*  the  size  of  the  BDD  representing  the  heuristic  function  is  in 
the  range  of  [8, 1908]  and  [8, 1000],  respectively.  The  GhSetA*  and  fSetA*  algorithms 
have  significantly  better  performance  than  all  other  algorithms.  As  usual  BDDA*  and 
iBDDA*  suffer  from  an  inefficient  expansion  computation  while  the  frontier  BDDs  blow 

4This  is  an  approximation  to  the  HSPr  heuristic  since  the  HSPr  heuristic  for  a  fact  /  estimates  the  number 
of  actions  needed  to  produce  /  from  the  initial  state  if  the  the  delete  set  of  actions  is  ignored.  By  measuring 
the  depth  of  /  in  a  forward  reachability  analysis  from  the  initial  state,  we  only  consider  the  depth  of  this 
dependency  tree  of  actions. 
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Algorithm 

P 

t total 

trel 

t  search 

|  sol  |  I  expand\  \Q\max 

it 

m 

ghSetA* 

4 

2.6 

0.0 

0.0 

6 

19.5 

1 

6 

706 

5 

2.7 

0.1 

0.1 

12 

33.4 

11 

31 

1346 

6 

2.6 

0.1 

0.1 

12 

57.7 

9 

30 

2608 

7 

3.1 

0.2 

0.4 

20 

53.8 

48 

152 

4685 

8 

4.1 

0.3 

1.3 

18 

540.4 

12 

72 

7475 

9 

17.0 

0.4 

14.1 

32 

331.8 

94 

991 

8717 

10 

116.2 

0.6 

113.1 

38 

744.9 

111 

2309 

11392 

11 

133.5 

0.7 

130.2 

32 

1404.9 

91 

1200 

16122 

12 

14.8 

1.0 

11.2 

34 

410.3 

120 

557 

18734 

13 

Time 

14 

112.1 

1.7 

107.8 

38 

1067.8 

125 

1061 

30707 

15 

Time 

fSetA* 

4 

2.5 

0.0 

0.0 

6 

29.8 

1 

6 

706 

5 

2.7 

0.1 

0.1 

12 

68.7 

4 

23 

1346 

6 

2.7 

0.1 

0.1 

12 

126.8 

2 

20 

2608 

7 

3.2 

0.2 

0.5 

20 

121.9 

8 

92 

4685 

8 

3.9 

0.3 

1.1 

18 

1328.8 

2 

35 

7475 

9 

30.0 

0.4 

27.1 

32 

935.5 

10 

610 

8717 

10 

217.0 

0.6 

213.8 

38 

2594.4 

12 

1098 

11392 

11 

259.8 

0.8 

256.4 

32 

4756.0 

9 

671 

16122 

12 

39.2 

1.0 

35.7 

34 

817.0 

13 

860 

18734 

13 

Time 

14 

274.3 

1.7 

270.0 

38 

1555.1 

13 

1462 

30707 

13 

Time 

BDDA* 

4 

3.3 

0.0 

0.1 

6 

37.8 

6 

706 

5 

3.6 

0.2 

0.2 

12 

76.7 

23 

1365 

6 

3.6 

0.2 

0.2 

12 

134.8 

20 

2334 

7 

4.9 

0.5 

1.2 

20 

129.9 

92 

4669 

8 

6.0 

0.5 

2.2 

18 

1336.8 

35 

6959 

9 

100.8 

1.1 

96.5 

32 

943.5 

610 

9923 

10 

Time 

iBDDA* 

4 

2.7 

0.0 

0.0 

6 

29.8 

6 

706 

5 

2.8 

0.1 

0.1 

12 

68.7 

23 

1365 

6 

2.9 

0.1 

0.1 

12 

126.8 

20 

2334 

7 

3.7 

0.3 

0.7 

20 

121.9 

92 

4669 

8 

6.2 

0.4 

3.2 

18 

1328.8 

35 

7123 

9 

113.7 

0.6 

110.3 

32 

935.5 

610 

10361 

10 

Time 

A* 

4 

0.0 

0.0 

6 

8 

15 

5 

0.2 

0.2 

12 

62 

70 

6 

0.4 

0.4 

12 

115 

102 

7 

1.3 

1.2 

20 

287 

287 

8 

31.9 

31.6 

18 

7787 

5252 

9 

233.9 

232.9 

32 

38221 

31831 

10 

Time 

Bidir 

4 

2.6 

0.0 

0.0 

6 

124.5 

6 

706 

5 

2.6 

0.1 

0.0 

12 

228.3 

12 

1423 

6 

2.7 

0.1 

0.1 

12 

438.8 

12 

2567 

7 

3.6 

0.2 

0.8 

20 

1931.3 

20 

5263 

8 

9.7 

0.3 

6.8 

18 

11181.8 

18 

8157 

9 

146.8 

0.4 

143.9 

30 

75040.9 

30 

11443 

10 

Time 

Table  4.3:  Results  of  the  Blocks  World  problems. 
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up  for  Bidir.  The  general  A*  algorithm  for  STRIPS  planning  problems  is  less  domain- 
tuned  than  the  previous  A*  implementations.  In  particular,  it  must  check  the  precondition 
of  all  actions  in  each  iteration  in  order  to  find  the  ones  that  are  applicable.  This  may  explain 
the  poor  performance  of  A*. 

Gripper 

The  Gripper  problems  are  from  the  first  round  of  the  STRIPS  track  of  the  AIPS  1998 
planning  competition.  The  domain  consists  of  two  rooms,  A  and  B,  connected  with  a  door 
and  robot  with  two  grippers.  Initially,  a  number  of  balls  are  located  in  room  A,  and  the 
goal  is  to  move  them  to  room  B.  The  number  of  states  grows  linearly  from  2 12  to  288. 
The  GHSetA*  and  fSetA*  algorithms  have  no  upper  bound  on  the  size  of  BDDs  in  the 
frontier  nodes  (u  =  oo).  For  all  BDD-based  algorithms  no  partition  limit  is  used,  and  they 
spend  about  0.8  seconds  on  initializing  the  BDD  package  (n  =  2 M  and  c  =  400AT).  All 
algorithms  generate  optimal  solutions.  The  results  are  shown  in  Table  4.4.  The  bottom 
graph  of  Figure  4.13  shows  the  total  CPU  time  of  the  algorithms.  Interestingly,  Bidir  is 
the  fastest  algorithm  in  this  domain  since  the  BDDs  representing  the  search  frontier  only 
grow  moderately  during  the  search.  The  GHSetA*  and  fSetA*  algorithms,  however,  have 
almost  as  good  performance.  BDDA*  and  iBDDA*  has  particularly  bad  performance  in 
this  domain.  The  problem  is  that  the  BDDs  of  their  frontier  nodes  are  large  compared 
to  other  domains  and  that  the  expansion  computation  of  these  algorithms  seems  to  scale 
poorly.  We  will  investigate  this  problem  in  detail  in  Section  4.4. 

Logistics 

The  logistics  domain  considers  moving  packages  with  trucks  between  sub-cities  and  with 
airplanes  between  cities.  The  problems  considered  are  from  the  STRIPS  track  of  the  AIPS 
2000  planning  competition.  The  number  of  states  grows  from  221  to  286.  The  GHSetA* 
and  fSetA*  algorithms  have  no  upper  bound  on  the  size  of  BDDs  in  the  frontier  nodes 
(u  =  oo).  For  all  BDD-based  algorithms,  a  partition  limit  of  5000  is  used  and  they  spend 
about  2.0  seconds  on  initializing  the  BDD  package  (n  =  8 M  and  c  =  400.A).  Due  to 
systematic  under  estimation,  the  HSPr  heuristic  is  scaled  with  a  factor  of  1.5.  The  top 
graph  of  Figure  4.14  shows  the  total  CPU  time  of  the  algorithms. 

ZenoTravel 

ZenoTravel  is  from  the  STRIPS  track  of  the  AIPS  2002  planning  competition.  It  involves 
transporting  people  around  in  planes,  using  different  modes  of  movement:  fast  and  slow. 
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Algorithm 

P 

t total 

trel 

t  search  |cx_pflfld|  |Q|raaa; 

it 

m 

ghSetA* 

2 

0.9 

0.1 

0.02 

68.8 

5 

21 

594 

4 

1.0 

0.1 

0.08 

168.9 

6 

43 

1002 

6 

1.3 

0.2 

0.27 

314.9 

6 

65 

1410 

8 

1.5 

0.3 

0.34 

504.8 

6 

87 

1818 

10 

1.8 

0.4 

0.54 

738.1 

6 

109 

2226 

12 

2.3 

0.5 

0.88 

1014.7 

6 

131 

2634 

14 

3.0 

0.7 

1.33 

1334.5 

6 

153 

3042 

16 

3.6 

0.9 

1.78 

1697.5 

6 

175 

3450 

18 

4.5 

1.1 

2.46 

2103.7 

6 

197 

3858 

20 

5.7 

1.4 

3.37 

2553.1 

6 

219 

4266 

fSetA* 

2 

1.0 

0.1 

0.1 

95.4 

1 

17 

594 

4 

1.0 

0.1 

0.1 

231.2 

1 

29 

1002 

6 

1.2 

0.2 

0.2 

423.9 

1 

41 

1410 

8 

1.6 

0.3 

0.3 

673.4 

1 

53 

1818 

10 

2.0 

0.4 

0.6 

979.9 

1 

65 

2226 

12 

2.5 

0.6 

1.0 

1343.3 

1 

77 

2634 

14 

3.1 

0.8 

1.4 

1763.5 

1 

89 

3042 

16 

3.7 

0.9 

1.9 

2240.7 

1 

101 

3450 

18 

5.0 

1.2 

2.9 

2774.7 

1 

113 

3858 

20 

5.7 

1.5 

3.2 

3365.6 

1 

125 

4266 

BDDA* 

2 

1.8 

0.1 

0.2 

103.4 

17 

323 

4 

2.4 

0.2 

0.6 

239.2 

29 

539 

6 

3.4 

0.3 

1.5 

431.9 

41 

755 

8 

6.1 

0.6 

4.0 

681.4 

53 

971 

10 

16.9 

0.9 

14.4 

987.9 

65 

1187 

12 

40.7 

1.2 

37.9 

1351.3 

77 

1403 

14 

81.7 

1.6 

78.5 

1771.5 

89 

1619 

16 

149.3 

2.2 

145.4 

2248.7 

101 

1835 

18 

240.4 

3.1 

235.5 

2782.7 

113 

2051 

20 

391.1 

3.9 

385.5 

3373.6 

125 

2267 

iBDDA* 

2 

1.2 

0.1 

0.1 

95.4 

17 

323 

4 

1.6 

0.1 

0.4 

231.2 

29 

539 

6 

2.3 

0.3 

1.0 

423.9 

41 

755 

8 

3.6 

0.4 

2.2 

673.4 

53 

971 

10 

6.2 

0.6 

4.5 

979.9 

65 

1187 

12 

12.2 

0.9 

9.2 

1343.3 

77 

1403 

14 

23.5 

1.1 

21.3 

1763.5 

89 

1619 

16 

44.8 

1.6 

42.1 

2240.7 

101 

1835 

18 

76.1 

2.2 

72.4 

2774.7 

113 

2051 

20 

120.9 

2.7 

116.7 

3365.6 

125 

2267 

A* 

2 

3.9 

3.9 

698 

1286 

4 

422.9 

422.3 

26434 

85468 

6 

Time 

Bidir 

2 

0.9 

0.1 

0.0 

125.4 

17 

323 

4 

1.0 

0.1 

0.1 

290.9 

29 

539 

6 

1.2 

0.2 

0.1 

589.7 

41 

755 

8 

1.4 

0.3 

0.3 

958.2 

53 

971 

10 

1.7 

0.4 

0.5 

1404.3 

65 

1187 

12 

2.2 

0.5 

0.8 

1611.0 

77 

1403 

14 

2.6 

0.7 

1.0 

2025.6 

89 

1619 

16 

3.2 

0.9 

1.3 

3265.6 

101 

1835 

18 

3.8 

1.2 

1.7 

4074.4 

113 

2051 

20 

4.5 

1.5 

2.1 

4944.9 

125 

2267 

Table  4.4:  Results  of  the  Gripper  problems. 
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Blocks  World 


Gripper 


Figure  4. 13:  Total  CPU  time  for  the  Blocks  World  and  Gripper  problems. 


The  number  of  states  grows  from  29  to  2165.  The  GhSetA*  and  fSetA*  algorithms  have 
no  upper  bound  on  the  size  of  BDDs  in  the  frontier  nodes  (u  =  oc).  For  all  BDD-based 
algorithms  a  partition  limit  of  4000  is  used.  About  2.7  seconds  is  spent  on  initializing  the 
BDD  package  (n  =  10M  and  c  =  700 K).  The  bottom  graph  of  Figure  4.14  shows  the 
total  CPU  time  of  the  algorithms.  The  results  are  very  similar  to  the  results  of  the  logistics 
problems. 
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Logistics 


Zeno  Travel 


Figure  4. 14:  Total  CPU  time  for  the  Logistics  and  ZenoTravel  problems.  Problem 
10  of  ZenoTravel  can  only  be  solved  by  GhSetA*  and  fSetA*. 


4.3.3  Channel  Routing  Problems 

Channel  routing  is  a  fundamental  subtask  in  the  layout  process  of  VLSI-design.  It  is  an  NP- 
complete  problem  which  makes  exact  solutions  hard  to  produce.  Channel  routing  considers 
connecting  pins  in  the  small  gaps  or  channels  between  the  cells  of  a  chip.  In  its  usual 
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formulation,  two  layers  are  used  for  the  wires:  one  where  wires  go  horizontal  (tracks)  and 
one  where  wires  go  vertical  (columns).  In  order  to  change  direction,  a  connection  must  be 
made  between  the  two  layers.  These  connections  are  called  vias.  Pins  are  at  the  top  and 
bottom  of  the  channel.  A  set  of  pins  that  must  be  connected  is  called  a  net.  The  problem 
is  to  connect  the  pins  optimally  according  to  some  cost  function.  The  cost  function  studied 
here  equals  the  total  number  of  vias  used  in  the  routing.  Figure  4.15  shows  an  example 
of  an  optimal  solution  to  a  small  channel  routing  problem.  The  cost  of  the  solution  is 
4.  One  way  to  apply  search  to  solve  a  channel  routing  problem  is  to  route  the  nets  from 


Columns 


Figure  4.15:  A  solution  to  a  channel  routing  problem  with  5  columns,  3  tracks,  and 
2  nets  (labeled  I  and  II).  The  pins  are  numbered  according  to  what  net  they  belong. 


left  to  right.  A  state  in  this  search  is  a  column  paired  with  a  routing  of  the  nets  on  the 
left  side  of  that  column.  A  transition  of  the  search  is  a  routing  of  live  nets  over  a  single 
column.  Recently,  it  has  been  shown  that  BDD-based  channel  routing  algorithms  utilizing 
this  strategy  efficiently  can  scale  in  the  number  of  columns  [153,  160].  The  belief  is  that 
such  algorithms  can  be  used  to  perform  subcomputations  of  a  global  router  that  decomposes 
the  routing  into  a  vertical  and  horizontal  part. 

A*  can  be  used  in  the  usual  way  to  find  optimal  solutions.  An  admissible  heuristic 
function  for  our  cost  function  is  the  sum  of  the  cost  of  routing  all  remaining  nets  optimally 
ignoring  interactions  with  other  nets.  We  have  implemented  a  specialized  search  engine  to 
solve  channel  routing  problems  with  GhSetA*  [90].  The  key  point  about  this  application 
of  state-set  branching  is  that  GhSetA*  utilizes  a  conjunctive  branching  partitioning  instead 
of  a  disjunctive  branching  partitioning  as  in  all  other  experiments  reported  so  far.  This  is 
possible  since  a  transition  can  be  regarded  as  the  joint  result  of  routing  each  net  in  turn. 

The  performance  of  GhSetA*  is  evaluated  using  problems  produced  from  two  ISCAS- 
85  circuits  [160].  For  each  of  these  problems  the  parameters  of  the  BDD  package  are  hand 
tuned  for  best  performance.  There  is  no  upper  bound  on  the  size  of  BDDs  in  frontier  nodes 
(u  =  oo)  and  no  limit  on  the  size  of  the  partitions.  Time  out  is  600  seconds.  Table  4.5  shows 
the  results.  The  performance  of  GhSetA*  is  similar  to  previous  applications  of  BDDs  to 
channel  routing  [153,  160,  175].  However,  in  contrast  to  previous  approaches,  GhSetA* 
finds  optimal  solutions,  whereas  the  previous  algorithms  only  find  valid  solutions.  The 
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Circuit 

c  —  t  —  n 

t total 

f  rel 

t search 

I  Q\max 

it 

Add 

38-3-10 

0.2 

0.1 

0.2 

i 

40 

47-5-27 

0.8 

0.7 

0.1 

24 

46 

41-3-12 

0.2 

0.1 

0.1 

1 

42 

46-7-20 

5.0 

3.5 

1.5 

56 

89 

25-4-6 

0.1 

0.0 

0.1 

1 

30 

C432 

83-4-33 

0.4 

0.2 

0.2 

0 

93 

89-11-58 

Mem 

101-9-57 

286.1 

61.5 

206.6 

135 

113 

99-8-58 

34.0 

13.5 

20.5 

59 

448 

97-10-63 

295.0 

99.7 

195.3 

129 

109 

101-7-53 

15.7 

11.5 

4.2 

90 

101 

95-9-48 

223.8 

58.9 

164.9 

59 

399 

95-10-48 

Time 

84-5-23 

3.2 

0.7 

2.5 

0 

92 

Table  4.5:  Results  of  the  ISCAS-85  channel  routing  problems.  A  problem, 
c  —  t  —  n.  is  identified  by  its  number  of  columns  (c),  tracks  (t).  and  nets  (n). 


experimental  results,  however,  show  that  the  benefit  of  using  guided  BDD-based  search  for 
channel  routing  is  limited.  The  reason  is  that  the  BDDs  representing  the  search  frontier  do 
not  blow  up  in  this  domain  as  in  most  other  planning  domains.  Instead  the  intermediate 
BDDs  of  the  image  computation  blow  up,  both  when  this  computation  is  based  on  a  regular 
conjunctive  partitioning  for  blind  search  and  when  it  is  utilizing  a  conjunctive  branching 
partitioning  for  guided  search.  It  may  be  the  case,  though,  that  more  efficient  encodings 
of  channel  routing  domains  exist.  For  instance,  for  the  (n2  —  1)-Puzzles,  it  is  much  more 
efficient  to  encode  for  each  position  what  tile  it  holds  rather  than  for  each  tile  encode 
what  its  position  is.  The  former  encoding  is  redundant  compared  to  the  latter  because  it 
also  represents  the  position  of  the  blank  space.  However,  the  representation  of  actions 
is  substantially  simplified  in  the  former  encoding  since  the  position  of  the  blank  space  is 
known. 
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4.4  Conclusion 

We  conclude  this  chapter  by  comparing  state-set  branching  to  single-state  heuristic  search, 
blind  BDD-based  search,  and  BDDA*. 

State-Set  Branching  versus  Single-State  Heuristic  Search 

Heuristic  search  is  trivial  if  the  heuristic  function  is  very  informative.  In  this  case,  state- 
set  branching  may  have  worse  performance  than  single- state  heuristic  search  due  to  the 
overhead  of  computing  the  transition  relation.  Thus,  we  do  not  expect  state-set  branch¬ 
ing  algorithms  to  have  better  performance  than  the  single-state  heuristic  search  algorithms 
applied  in  the  AIPS  planning  competitions  because  the  problems  considered  have  very 
strong  heuristics  [79].  In  this  experimental  evaluation,  we  consider  finding  optimal  or  near 
optimal  solutions  with  state-set  branching  implementations  of  A*.  The  studied  heuristic 
functions  are  classical  but  leave  a  significant  search  element  for  the  algorithms  to  handle. 
For  these  problems,  state-set  branching  outperforms  single-state  A*.  Notice  that  this  result 
is  consistent  with  the  fact  that  single-state  A*  is  optimally  efficient.  The  reason  is  that  a 
state-set  branching  implementation  of  A*  may  use  an  exponentially  more  compact  state 
representation  than  single-state  A*. 

State-Set  Branching  versus  Blind  BDD-based  Search 

Blind  BDD-based  search  has  been  successfully  applied  in  symbolic  model  checking  and 
circuit  verification.  It  has  been  shown  that  many  problems  encountered  in  practice  are 
tractable  when  using  BDDs  [168].  The  classical  search  problems  studied  in  AI,  however, 
seem  to  be  harder  and  have  longer  solutions  than  the  problems  considered  in  formal  verifi¬ 
cation.  When  applying  blind  BDD-based  search  to  these  problems,  the  BDDs  used  to  repre¬ 
sent  the  search  frontier  often  grow  fast.  The  experimental  evaluation  of  state-set  branching 
shows  that  this  problem  can  be  substantially  reduced  when  efficiently  splitting  the  search 
frontier  according  to  a  heuristic  evaluation  of  the  states. 

State-Set  Branching  versus  BDDA* 

State-set  branching  implementations  of  A*  such  as  GhSetA*  and  fSetA*  are  fundamen¬ 
tally  different  from  BDDA*.  BDDA*  does  not  exploit  a  partitioning  of  the  transitions 
according  to  how  they  change  the  g  and  /?- value.  Instead,  it  imitates  the  usual  explicit  ap¬ 
plication  of  the  heuristic  function  via  a  symbolic  computation.  It  would  be  reasonable  to 
expect  that  the  symbolic  representation  of  practical  heuristic  functions  often  is  very  large. 


4.4.  CONCLUSION 


81 


However,  this  is  seldom  the  case  for  the  heuristic  functions  studied  in  this  experimental 
evaluation.  The  major  challenge  for  BDDA*  is  that  the  arithmetic  computations  at  the 
BDD  level  scales  poorly  with  the  size  of  the  BDD  representing  the  set  of  states  to  expand 
(line  5  and  6  in  Figure  4.7).  This  hypothesis  can  be  empirically  verified  by  measuring  the 
CPU  time  used  by  FSetA*  and  iBDDA*  to  expand  a  set  of  states.  Recall  that  FSetA* 
and  iBDDA*  expand  the  exact  same  set  of  states  in  each  iteration.  Any  performance  dif¬ 
ference  is  therefore  solely  caused  by  their  expansion  techniques.  The  results  are  shown  in 
Figure  4.16.  The  reported  CPU  time  is  the  average  of  the  15-Puzzle  with  50,  100,  and  200 
random  steps,  Logistics  problem  4  to  9,  Blocks  World  problem  4  to  9,  Gripper  problem  1  to 
20,  and  DxV4M15  with  x  varying  from  1  to  6.  For  very  small  frontier  BDDs,  iBDDA*  is 


Size  of  BDD  to  expand 


Figure  4.16:  Node  expansion  times  of  fSetA*  and  BDDA*. 


slightly  faster  than  fSetA*.  This  is  probably  because  small  frontier  BDDs  mainly  are  gen¬ 
erated  by  easy  problems  where  a  possibly  monolithic  transition  relation  used  by  iBDDA* 
is  more  efficient  than  the  partitioned  transition  relation  used  by  fSetA*.  However,  for 
large  frontier  BDDs,  BDDA*  needs  much  more  time  to  expand  the  frontier  than  fSetA*. 
Another  limitation  of  BDDA*  is  the  inflexibility  of  BDD-based  arithmetic.  It  makes  it 
hard  to  extend  BDDA*  efficiently  to  general  evaluation  functions  and  arbitrary  transitions 
costs. 
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4.5  Summary 

This  chapter  has  introduced  a  new  framework  called  state-set  branching.  State-set  branch¬ 
ing  seamlessly  combines  BDDs  and  heuristic  search  via  a  state-set  version  of  the  classical 
best-first  search  algorithm  using  a  new  partitioning  technique  called  branching  partition¬ 
ing.  It  has  been  shown  that  the  framework  is  general.  It  applies  to  any  heuristic  func¬ 
tion,  any  cost  function  and  any  node  evaluation  function.  In  addition,  both  disjunctive  and 
conjunctive  versions  of  branching  partitions  can  be  defined.  The  experimental  evaluation 
proves  state-set  branching  to  be  a  powerful  approach  that  often  outperforms  both  single¬ 
state  heuristic  search  and  blind  BDD-based  search.  Moreover,  it  has  substantially  better 
performance  than  the  approach  used  by  BDDA*. 


Chapter  5 

Non-Deterministic  State-Set  Branching 


A  limitation  of  the  current  BDD-based  non-deterministic  planning  algorithms  is  that  they 
perform  blind  search.  A  backward  search  frontier  is  expanded  in  a  breadth-first  manner 
and  the  final  non-deterministic  plan  may  cover  a  large  number  states  that  are  unreachable 
from  the  initial  states.  In  this  chapter,  we  describe  how  to  use  state-set  branching  to  guide 
these  algorithms  [95].  We  begin  in  Section  5.1  by  introducing  a  generic  non-deterministic 
planning  algorithm  for  guided  search.  Then,  in  Section  5.2,  the  guided  precomponents 
of  weak,  strong  cyclic,  and  strong  planning  are  defined.  Section  5.3  describes  a  range  of 
experimental  results  showing  that  the  new  algorithms  may  dramatically  reduce  both  the 
search  time  and  the  plan  size  compared  with  the  current  algorithms.  Finally,  Section  5.4 
draws  conclusions. 


5.1  Guided  Non-Deterministic  Planning 

As  described  in  the  previous  chapter,  the  state-set  branching  framework  has  two  indepen¬ 
dent  parts:  a  modification  of  the  best-first  search  algorithm  to  expanding  sets  of  states  in 
each  iteration  and  a  specialized  partitioning  technique  called  branching  partitioning  to  im¬ 
plement  the  new  algorithm  efficiently  with  BDDs.  A  key  observation  is  that  branching 
partitioning  also  can  be  used  to  propagate  search  control  information  between  states  in 
non-deterministic  domains.  This  follows  directly  from  the  fact  that  branching  partition¬ 
ing  is  defined  at  the  transition  level  and  therefore  is  independent  of  whether  actions  are 
deterministic.  The  major  difference  when  considering  non-deterministic  planning  is  that 
it  seems  to  be  very  hard,  if  not  impossible,  to  cast  the  generic  non-deterministic  planning 
algorithm  NDP  shown  in  Figure  3.8  as  a  search  tree  algorithm.  The  problem  is  to  guar¬ 
antee  completeness  for  strong  and  strong-cyclic  planning.  Consider  for  example  using  a 
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search  tree  to  generate  a  strong  plan.  Assume  that  the  algorithm  at  some  point  during  the 
search  adds  a  node  n  with  states  P  from  the  frontier  of  the  search  tree  to  the  plan.  Let  C 
denote  the  set  of  states  covered  by  the  plan  after  this  incrementation.  The  algorithm  then 
computes  the  child  nodes  of  n.  This  can  be  done  by  finding  state-action  pairs  (SAs)  that 
can  reach  P  in  one  step  and  form  a  subset  of  a  strong  precomponent  of  C.  Assume  that  the 
algorithm  in  the  next  iteration  expands  a  node  that  is  not  a  child  node  of  n.  Let  C'  denote 
the  set  of  states  covered  by  the  plan  after  adding  the  states  of  this  node  to  the  plan.  This, 
however,  may  affect  the  child  nodes  of  n.  If  the  child  nodes  are  recomputed  with  respect  to 
C'  and  not  C  they  may  contain  a  larger  set  of  SAs  since  C'  D  C.  This  makes  the  algorithm 
incomplete  since  the  child  nodes  of  n  are  computed  only  once.  A  similar  problem  exists 
for  strong-cyclic  planning.  For  weak  planning,  on  the  other  hand,  it  is  possible  to  define 
a  complete  search  tree  algorithm  since  the  set  of  SAs  of  the  child  nodes  are  independent 
of  the  set  of  states  in  the  plan  (defined  in  [95]).  In  this  presentation,  though,  we  propose  a 
general  framework  for  pure  heuristic  non-deterministic  planning  called  non-deterministic 
state-set  branching  where  a  heuristic  function  is  used  to  select  a  subset  of  the  blind  pre¬ 
component  in  each  iteration.  Non-deterministic  state-set  branching  is  based  on  the  generic 
guided  non-deterministic  planning  algorithm  GNDP  shown  in  Figure  5.1. 

function  GNDP(s0,  G,  hg) 

1  P  <—  0;  C  <(—  emptyMap;  C [hg]  •<—  G 

2  while  s0  (£C 

3  Pc  «-  GPreComp(C) 

4  if  |PC|  =  0  then  return  “no  solution  exists” 

5  P  «-  P  U  Pc 

6  for  k  =  1  to  |PC| 

7  C [hk\  <-  C [hk]  U  States (Pc [hk]) 

8  return  P 


Figure  5.L  A  generic  guided  algorithm  for  synthesizing  non-deterministic  plans. 


The  GNDP  algorithm  is  similar  to  NDP.  The  main  difference  is  that  it  keeps  the  set 
of  states  covered  by  the  plan  in  a  map  C.  The  purpose  of  the  map  is  to  partition  the 
covered  states  with  respect  to  the  value  of  a  heuristic  function  that  for  a  state  s  estimates 
the  minimum  length  of  a  path  from  s0  to  s.  In  each  iteration,  a  guided  precomponent  Pc 
is  computed  and  added  to  the  plan.  The  precomponent  function  must  be  valid  according  to 
Definition  5.1. 
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Definition  5.1  (Guided  Precomponent  Function)  If  C  associates  states  with  their  cor¬ 
rect  h-value  then  a  guided  precomponent  function  GPreComp(C)  is  valid  iff 

•  GPreComp  :  (K+  ->•  2s)  ->•  (K+  -)•  25x^c<), 

•  GPreComp (C)  terminates, 

•  if  Pc  =  GPreComp (C)  then  for  any  { s,a )  G  Pc[/ij]  ,  we  have  s  f  C  and  the 
h-vcdue  of  s  is  h,t. 

The  completeness  of  the  algorithm  is  due  to  the  fact  that  the  precomponent  computation 
does  not  rely  on  previous  computations  that  may  have  become  outdated.  A  limitation  of 
GNDP  is  that  all  goal  states  are  assumed  to  have  identical  /i-value  (hg).  It  is  easy,  though, 
to  generalize  the  algorithm  to  take  a  set  of  goal  states  partitioned  with  respect  to  h-v alue  as 
input.  It  has  not  been  done  here  to  simplify  the  presentation. 

Theorem  5.1  (Termination  of  GNDP)  GNDP  terminates. 

Proof  Given  in  Appendix  B  □ 


5.2  Guided  Precomponents 

In  this  section,  we  introduce  the  guided  precomponents  for  weak,  strong  cyclic,  and  strong 
planning  used  by  GNDP.  For  weak  and  strong  planning,  the  guided  precomponent  is 
the  set  of  state-action  pairs  (SAs)  in  a  complete  blind  precomponent  that  has  states  with 
minimum  5- value.  In  both  cases,  this  strategy  results  in  a  pure  heuristic  search,  since  only 
the  heuristic  estimate  of  the  distance  to  the  initial  state  is  used  to  guide  the  search.  For 
strong-cyclic  planning,  the  guided  precomponent  is  computed  from  a  set  of  candidate  SAs 
built  from  a  search  tree  of  weak  precomponents  grown  from  the  set  of  states  covered  by  the 
plan.  In  order  to  avoid  a  too  “narrow”  candidate  set,  both  the  heuristic  value  of  the  states 
and  the  depth  in  the  tree  is  taken  into  account  when  choosing  a  node  to  expand.  After  each 
expansion  of  the  candidate  set,  the  SCPlanAux  function  defined  in  Section  3.2.2  is  used 
to  extract  a  strong  cyclic  precomponent  from  the  candidate,  if  possible. 

Similarly  to  blind  non-deterministic  planning,  the  core  operation  in  guided  non-determi- 
nistic  planning  is  to  find  the  preimage  of  a  set  of  states.  However,  we  also  need  to  split  the 
SAs  in  the  preimage  according  to  the  5- value  of  the  states.  As  for  deterministic  state-set 
branching,  we  can  use  a  disjunctive  branching  partitioning  to  compute  and  split  the  preim¬ 
age  in  a  single  operation.  Assume  that  the  disjunctive  branching  partitioning  is  of  the  form 

Ri(v,a,y'1),---,Rn(v,d,y'n) 
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where  the  h-  value  change  (in  the  is  forward  direction)  of  transitions  in  subrelation  i  is  Sht. 
The  preimage  of  subrelation  i  is  then  given  by 

PreImgSA^C)  =  3y'i.Ri(v,a,y'i)AC(v)[yi/y’i\.  (5.1) 

Thus,  a  complete  partitioned  preimage  is  found  by  computing  PreImgSAj(C)  for  each 
subrelation  in  turn  and  merging  partitions  with  identical  /i- value. 


5.2.1  Guided  Weak  Precomponents 

The  main  computation  of  the  guided  weak  precomponent  is  to  find  the  preimage  of  a  set  of 
states  S  and  prune  it  for  SAs  where  the  state  is  in  C 

PreCompW i(C,S)  =  PreImgSAj(iS)  \  C  x  Act.  (5.2) 

The  algorithm  computing  the  guided  weak  precomponent  is  shown  in  Figure  5.2.  The  input 

function  GPreCompW  (C) 

1  Q  <—  empty  Queue', 

2  for  j  =  1  to  |  C  | 

3  for  i  =  1  to  n 

4  wSA  PreCompW *((7,  C[/^]) 

5  Q  <- Insert (Q,(wS A,  hj  -  5hi)) 

6  if  \Q\  =  0  then  return  emptyMap 

7  else  return  RemoveTop((5) 


Figure  5.2:  The  guided  weak  precomponent. 


to  the  function  is  a  map  of  covered  states  C  where  each  entry  contains  a  set  of  states  with 
identical  h- value.  A  priority  queue  Q  stores  the  weak  precomponents  of  C.  The  keys  of 
Q  is  the  h-v alues  of  the  set  of  SAs  forming  the  entries  of  Q.  As  usual,  the  set  of  keys  of 
Q  are  sorted  ascendingly  and  a  node  inserted  in  Q  is  merged  with  any  existing  node  with 
identical  h- value.  RemoveTop((3)  returns  a  map  with  the  top  of  Q  as  its  only  element. 

Let  GuidedWeak  denote  the  GNDP  algorithm  using  the  guided  weak  precomponent 
function.  It  can  be  shown  that  GuidedWeak  is  sound,  complete  and  terminating.  How¬ 
ever,  since  a  pure  heuristic  search  strategy  is  employed,  solutions  are  not  guaranteed  to  be 
weak  distance  optimal  like  solutions  computed  with  the  Weak  algorithm. 
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Theorem  5.2  (Correctness  of  GuidedWeak)  The  GuidedWeak  planning  algorithm  is 
correct.  The  algorithm  returns  “no  solution  exists”  iff  no  solution  exists,  otherwise  it  re¬ 
turns  a  valid  solution. 

Proof.  This  follows  from  the  soundness,  completeness,  and  termination  theorems  of  Guid¬ 
edWeak  proven  in  Appendix  B.  □ 

5.2.2  Guided  Strong  Precomponents 

The  main  computation  of  the  guided  strong  precomponent  is  to  find  a  preimage  of  a  set  of 
states  S  and  prune  it  for  SAs  where  the  state  either  is  in  C  or  the  SA  can  lead  outside  of  C 

PreCompS fC,  S)  =  (pRElMGSAj(S')  \  PreImgSA(C))  \  C  x  Act.  (5.3) 

The  algorithm  computing  the  guided  strong  precomponent  is  shown  in  Figure  5.3. 
It  is  similar  to  GPreCompS  except  that  the  function  PreCompS*  is  used  instead  of 
PreCompW  j. 


function  GPreCompS  (C) 

1  Q  ■<—  empty  Queue', 

2  for  j  =  1  to  |  C  | 

3  for  %  =  1  to  n 

4  sSA  <-  PreCompS  fC,  C  [ff]) 

5  Q  <—  Insert (Q,  ( sSA ,  hj  -  5 hi)) 

6  if  |  Q  |  =  0  then  return  empty  Map 

7  else  return  RemoveTop((3) 


Figure  5.3:  The  guided  strong  precomponent. 


Let  GuidedStrong  denote  the  GNDP  algorithm  using  the  guided  strong  precompo¬ 
nent  function.  It  can  be  shown  that  GuidedStrong  is  sound,  complete  and  terminating. 
However,  similarly  to  GuidedWeak,  since  a  pure  heuristic  search  strategy  is  employed, 
solutions  are  not  guaranteed  to  be  strong  distance  optimal  like  solutions  computed  with  the 
Strong  algorithm. 

Theorem  5.3  (Correctness  of  GuidedStrong)  The  GuidedStrong  planning  algorithm 
is  correct.  The  algorithm  returns  “no  solution  exists”  iff  no  solution  exists,  otherwise  it 
returns  a  valid  solution. 
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Proof.  This  follows  from  the  soundness,  completeness,  and  termination  theorems  of  Guid¬ 
ed  Strong  proven  in  Appendix  B.  □ 

5.2.3  Guided  Strong  Cyclic  Precomponents 

The  guided  strong  cyclic  precomponent  is  fairly  different  from  the  weak  and  strong  pre¬ 
components.  At  each  call,  the  algorithm  builds  a  set  of  candidate  SAs  from  a  search  tree 
of  weak  precomponents  grown  from  the  set  of  covered  states  C.  For  each  extension  of 
the  candidate  set,  the  SCPlanAux  function  defined  in  Section  3.2.2  is  called  to  extract  a 
strong  cyclic  precomponent  from  the  candidate,  if  possible.  The  search  queue  Q  stores  the 

function  GPreCompSC(C) 

1  Q  <—  emptyQueue 

2  for  j  =  1  to  |  C  | 

3  for  *  =  1  to  n 

4  cSA  PreCompW j (G,  C [hj]) 

5  Q  <r-  Insert (Q,  { cSA ,  1,  h3  -  5 hi)) 

6  wSA  0;  wS  <r-  empty  Map 

7  repeat 

8  if  \Q\  =  0  then  return  emptyMap 

9  ( pSA ,  cl,  h)  <-  RemoveTop(Q) 

10  pSA  «-  pSA  \  wSA 

11  if  pSA  f  0  then 

12  pS  <r-  States  (pSA) 

13  wS[/i]  wS[/i]  U  pS 

14  for  *  =  1  to  n 

15  cSA  •<—  PreCompW  j  {C ,  pS) 

16  Q  <- Insert (Q,{cSA,d+l,  h- Shi)) 

17  wSA  •<—  wSA  U  pSA 

18  scSA  <-  SCPlanAux (wSA,C) 

19  until  scSA  f  0 

20  Pc  <—  emptyMap 

21  for  k  =  1  to  |wS| 

22  P c[hk]  •<—  wS[/ijt]  Cl  scSA 

23  return  Pc 

Figure  5.4:  The  guided  strong  cyclic  precomponent, 
frontier  nodes  of  a  search  tree  of  weak  precomponents  generated  from  the  states  C  in  the 
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current  plan.  Each  node  is  associated  with  its  h-  value  as  usual,  however,  for  this  algorithm, 
we  also  associate  a  node  with  its  depth  d  in  the  search  tree.  In  addition,  the  highest  priority 
is  given  to  nodes  with  smallest  sum  of  h  and  d.  In  case  of  a  tie,  the  highest  priority  is 
given  to  the  node  with  smallest  depth.  When  inserting  a  new  node  in  Q,  it  will  only  be 
merged  with  an  existing  node  in  Q  if  this  node  has  identical  h  and  d  value.  For  each  call 
to  GPreCompSC,  a  new  search  tree  is  generated.  First,  the  weak  precomponents  of  the 
states  C  covered  by  the  current  plan  are  added  to  Q  (1.1-5).  These  are  all  at  depth  1  in 
the  tree.  Then,  the  candidate  SAs  wSA  for  the  strong  cyclic  precomponent  and  auxiliary 
variables  are  initialized  (1.6).  The  repeat  loop  (1.7-19)  performs  a  guided  version  of  the 
expansion  and  pruning  of  wSA  carried  out  by  PreCompS C(C).  The  effect  of  taking  the 
depth  in  the  search  tree  into  account  is  that  the  set  of  candidate  SAs  does  not  form  a  narrow 
beam  in  the  state  space  that  is  unlikely  to  contain  a  strong  cyclic  precomponent.  When  a 
non-empty  strong  cyclic  precomponent  scSA  is  found,  the  SAs  of  this  precomponent  are 
partitioned  with  respect  to  their  /(-value  (1.20-22)  and  the  resulting  map  is  returned. 

Fet  GuidedStrongCyclic  denote  the  GNDP  algorithm  using  the  strong  cyclic  pre¬ 
component  function.  It  can  be  shown  that  GuidedStrongCyclic  is  sound,  complete 
and  terminating. 

Theorem  5.4  (Correctness  of  GuidedStrongCyclic)  The  GuidedStrongCyclic  plan¬ 
ning  algorithm  is  correct.  The  algorithm  returns  “no  solution  exists”  iff  no  solution  exists, 
otherwise  it  returns  a  valid  solution. 

Proof.  This  follows  from  the  soundness,  completeness,  and  termination  theorems  of  Guid¬ 
edStrongCyclic  proven  in  Appendix  B.  □ 


5.3  Experimental  Results 

The  performance  of  the  guided  non-deterministic  planning  algorithms  has  been  evaluated 
in  three  non-deterministic  domains  and  two  deterministic  domains.  We  include  the  deter¬ 
ministic  domains  since  only  a  limited  number  of  parameterized  non-deterministic  domains 
with  efficient  search  heuristics  has  been  modeled  so  far.  The  deterministic  domains  may 
provide  some  information  about  robustness  of  the  search  approach  of  non-deterministic 
state-set  branching  to  different  heuristics. 

All  experiments  are  carried  out  using  the  BIFROST  0.7  search  engine  and  the  experi¬ 
mental  setting  described  in  Appendix  A.  As  usual,  n  denotes  the  number  of  BDD-nodes 
allocated  to  represent  the  shared  BDD,  and  c  denotes  the  number  of  BDD  nodes  allocated 
to  represent  BDDs  in  the  operator  caches  used  to  implement  dynamic  programming.  Total 
CPU  time  includes  time  spent  on  allocating  memory  of  the  BDD  package,  parsing  the  prob- 
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lem  description  and,  in  case  of  PDDL  problems,  analysing  the  problem  in  order  to  make 
a  compact  Boolean  state  encoding.  Time  out  changes  between  the  experiments.  The  algo¬ 
rithms  are  out  of  memory  when  they  start  page  faulting  to  the  hard  drive  at  approximately 
450  MB  RAM. 

5.3.1  Non-Deterministic  Domains 

The  three  non-deterministic  domains  are  a  non-deterministic  version  of  the  8-Puzzle,  a  real- 
world  steel  producing  plant  of  SIDMAR  in  Ghent,  Belgium  used  as  an  ESPRIT  case  study 
[56],  and  a  real-world  domain  for  Power  Supply  Restoration  (PSR)  introduced  in  [162]. 

Non-Deterministic  8-Puzzle 

To  make  the  8-Puzzle  domain  non-deterministic,  we  assume  that  up  and  down  moves  of  the 
blank  space  may  move  left  and  right  as  well,  as  shown  in  Figure  5.5.  Left  and  right  moves 
are  deterministic  in  order  to  ensure  that  a  strong  plan  exists  for  any  reachable  initial  state. 
However,  we  only  consider  actions  that  at  most  reduce  the  distance  to  the  initial  state  by 
one.  Otherwise,  the  sum  of  Manhattan  distances  becomes  a  too  conservative  heuristic. 
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Figure  5.5:  Moves  of  the  blank  space  and  their  possible  outcomes. 


We  consider  problems  where  the  minimum  length  of  a  path  from  the  initial  state  to  the 
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goal  state  grows  linearly  from  8  to  23.  The  BDD  package  was  initialized  with  n  =  AM 
and  c  =  700 K.  The  threshold  for  merging  partitions  in  the  disjunctive  transition  relation 
partitioning  was  5000.  Memory  allocation  and  transition  relation  construction  took  1.56 
and  1 .34  seconds  respectively  for  all  experiments.  The  results  are  shown  in  Figure  5.6.  Each 
data  point  shown  in  the  graphs  is  the  average  of  3  computational  results.  The  results  show 
a  dramatic  positive  impact  of  guiding  the  search  for  all  three  algorithms.  As  depicted  in  the 
graphs,  it  may  reduce  not  only  the  total  computation  time  but  also  the  size  of  the  produced 
plans.  This  may  be  somewhat  surprising  since  the  guided  algorithms  apparently  repeat  a 
large  number  of  computations.  The  previous  results  of  such  recomputations,  however,  may 
often  be  stored  in  the  operator  cache  of  the  BDD  package  and  may  therefore  not  cause  a 
significant  computation  overhead. 

SIDMAR 

The  SIDMAR  domain  is  an  abstract  model  of  a  real-world  steel  producing  plant  in  Ghent, 
Belgium  used  as  an  ESPRIT  case  study  [56].  The  layout  of  the  steel  plant  is  shown  in 
Figure  5.7.  The  goal  is  to  cast  steel  of  different  qualities.  Pig  iron  is  poured  portion- wise 
in  ladles  by  the  two  converter  vessels.  The  ladles  can  move  autonomously  on  the  two  east- 
west  tracks.  However,  two  ladles  can  not  pass  each  other  and  there  can  at  most  be  one  ladle 
between  machines.  Ladles  are  moved  in  the  north-south  direction  by  the  two  overhead 
cranes.  The  pig  iron  must  be  treated  differently  to  obtain  steel  of  different  qualities.  Before 
empty  ladles  are  moved  to  the  storage  place,  the  steel  is  cast  by  the  continuous  casting 
machine.  A  ladle  can  only  leave  the  casting  machine  if  there  already  is  a  filled  ladle  at  the 
holding  place.  The  actions  of  machine  1,2,4,  and  5  are  non-deterministic.  They  may  either 
cause  the  steel  in  the  ladles  to  be  treated  or  the  machine  to  break.  To  ensure  that  a  strong 
plan  exists,  actions  have  been  added  to  the  domain  that  can  fix  failed  machines. 

We  consider  producing  steel  from  two  ladles.  They  both  need  an  initial  treatment  on 
machine  1  or  4  and  2  or  5.  One  of  the  ladles  in  addition  needs  a  treatment  on  machine  3 
and  a  final  treatment  on  machine  2  or  5  before  being  cast.  Non-determinism  is  caused  by 
machine  failures.  We  consider  6  problems  where  the  goal  states  correspond  to  situations 
with  growing  distances  from  the  initial  state  during  the  production  of  these  two  ladles.  The 
number  of  completed  treatments  is  used  as  heuristic  function.  Notice  that  this  heuristic  is 
relatively  weak  compared  to  the  sum  of  Manhattan  distances  used  for  the  8-Puzzle  since 
it  severely  underestimates  the  minimum  distance  to  the  initial  state.  The  BDD  package 
was  initialized  with  n  =  8 M  and  c  =  700 K.  The  threshold  for  merging  partitions  in 
the  disjunctive  transition  relation  partitioning  was  5000.  Memory  allocation  and  transition 
relation  construction  took  2.34  and  0.22  seconds  respectively  for  all  experiments.  The 
results  are  shown  in  Figure  5.8.  Again,  we  observe  a  large  performance  gain  obtained  by 
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Figure  5.6:  Results  of  the  non-deterministic  8-Puzzle  experiments. 
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Figure  5 .7 :  Layout  of  the  SIDMAR  steel  plant. 


guiding  the  search  for  all  three  algorithms  both  in  terms  of  total  CPU  time  and  the  plan  size. 
These  results  are  encouraging  since  SIDMAR  is  a  real-world  domain  and  non-determinism 
is  caused  by  realistic  faults.  In  addition,  the  results  demonstrate  that  even  a  very  weak 
heuristic  may  have  a  substantial  positive  effect  on  performance. 

PSR 

The  Power  Supply  Restoration  domain  (PSR)  is  a  network  of  electric  lines  connected  via 
switching  devices  (SDs),  and  fed  via  circuit-breakers  (CBs).  Switching  devices  and  circuit 
breakers  can  either  be  open  or  closed.  A  circuit-breaker  supplies  power  when  it  is  closed, 
and  a  switching  device  stops  the  power  propagation  if  it  is  open.  Consumers  may  be  located 
on  any  line  and  are  supplied  only  when  the  line  is  supplied.  We  assume  that  each  closed 
circuit-breaker  forms  a  feeder.  A  feeder  is  a  tree  consisting  of  closed  switching  devices  and 
lines  reachable  downstream  from  the  circuit  breaker.  The  leafs  are  open  switching  devices 
and  dead  end  lines. 

Example  5.1  The  “simple”  PSR  domain  investigated  in  [14]  is  shown  in  Figure  5.9.  In  the 
depicted  configuration,  it  only  has  a  single  feeder  rooted  in  CB  2.  0 
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Figure  5.8:  Results  of  the  SIDMAR  experiments. 
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Figure  5.9:  The  Simple” PSR  domain  studied  in  [14].  A  filled  box  denote  that 
the  associated  circuit-breaker  or  switching  device  is  closed.  Supplied  and  unsupplied 
lines  are  drawn  solid  and  dashed,  respectively. 


In  the  original  definition  of  PSR  domains,  each  unit  in  the  system  may  fail.  Lines  may 
short  circuit,  and  switches  may  get  stuck  in  one  of  their  two  positions.  In  addition,  states 
are  assumed  only  to  be  partially  observable.  We  consider  a  simplified  version  of  the  do¬ 
main  where  states  are  fully  observable  and  lines  do  not  fail.  The  actions  of  the  simplified 
domain  is  to  open  and  close  switching  devises  and  circuit  breakers.  The  actions  are  non¬ 
determini  Stic,  they  may  open  and  close  these  units  correspondingly  or  cause  the  units  to 
break  permanently  and  get  stuck  in  their  current  position. 

The  studied  networks  are  on  the  linear  form  shown  in  Figure  5.10  with  n  ranging  from  5 
to  35.  Initially,  every  unit  is  open  and  the  goal  is  to  feed  each  line.  Since  any  combination  of 
errors  may  happen,  neither  a  strong  nor  strong  cyclic  solution  exists.  The  heuristic  used  to 
guide  the  search  is  the  number  of  lines  with  power.  The  results  of  the  experiment  is  shown 
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Figure  5.10:  The  linear  PSR  networks  used  for  experiments. 


in  Figure  5.11  The  BDD  package  was  initialized  with  n  =  15 M  and  c  =  500/f.  The 
threshold  for  merging  partitions  in  the  disjunctive  transition  relation  partitioning  was  5000. 
Memory  allocation  took  3.41  seconds  for  all  experiments.  Transition  relation  construction 
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took  between  0.05  and  1.21  seconds.  Again,  we  see  a  substantial  positive  impact  of  guiding 
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Figure  5.11:  Results  of  the  guided  weak  algorithm  on  the  linear  PSR  problems. 


the  search. 

5.3.2  Deterministic  Domains 

The  two  deterministic  domains  are  the  logistics  domain  described  in  Section  4.3.2  and  the 
ZenoTravel  domain  described  in  Section  4.3.2. 

Logistics 

For  the  logistics  domain,  we  again  study  the  problems  from  the  STRIPS  track  of  the  AIPS 
2000  planning  competition  and  use  the  HSPr  heuristic  to  guide  the  search  (recall  that  this  is 
a  heuristic  for  backward  search).  The  results  of  the  experiment  is  shown  in  Figure  5.12.  The 
BDD  package  was  initialized  with  n  =  12 M  and  c  =  400 A.  The  threshold  for  merging 
partitions  in  the  disjunctive  transition  relation  partitioning  was  5000.  Memory  allocation 
took  1.9  seconds  for  all  experiments.  Transition  relation  construction  took  between  0.10 
and  0.77  seconds.  The  results  show  a  significant  performance  improvement  for  each  algo¬ 
rithm.  However,  since  the  domain  is  deterministic  the  blind  weak,  strong  cyclic  and  strong 
precomponents  are  identical.  Thus,  we  may  expect  the  guided  precomponents  and  the  so¬ 
lutions  returned  by  the  guided  weak,  strong  cyclic  and  strong  algorithm  to  be  identical. 
This  seems  to  be  the  case.  Even  though  each  algorithm  computes  fairly  similar  solutions, 
the  results  bring  further  evidence  that  the  general  guiding  strategy  employed  by  the  algo¬ 
rithms  has  good  performance  for  a  wide  range  of  heuristics.  In  addition,  the  results  show 
that  computational  overhead  of  the  algorithms  is  fairly  similar.  Not  surprisingly,  the  weak 
algorithm  seems  to  have  the  smallest  overhead. 
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Figure  5.12:  Results  of  the  Logistics  experiments. 
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Figure  5.13:  Results  of  the  ZenoTravel  experiments. 
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Zeno  Travel 

For  the  ZenoTravel  domain,  we  again  study  the  problems  from  the  STRIPS  track  of  the 
AIPS  2002  planning  competition  and  use  the  HSPr  heuristic  to  guide  the  search.  The 
results  of  the  experiment  is  shown  in  Figure  5.13  The  BDD  package  was  initialized  with 
n  =  12 M  and  c  =  400 K.  The  threshold  for  merging  partitions  in  the  disjunctive  transition 
relation  partitioning  was  5000.  Memory  allocation  took  2.70  seconds  for  all  experiments. 
Transition  relation  construction  took  between  0. 1 1  and  18.03  seconds.  The  results  and  their 
interpretation  are  similar  to  the  Logistics  domain. 


5.4  Conclusion 

Our  investigation  of  non-deterministic  state-set  branching  has  shown  that  it  is  possible 
to  employ  branching  partitionings  in  non-deterministic  domains  and  define  pure  heuristic 
search  strategies  of  the  weak,  strong  cyclic,  and  strong  planning  algorithms.  The  exper¬ 
imental  results  show  that  non-deterministic  state-set  branching  can  lead  to  large  perfor¬ 
mance  gains  compared  to  blind  search,  not  only  in  terms  of  CPU  time  but  also  in  terms 
of  plan  size.  The  main  limitation  of  the  approach  is  that  no  optimality  guarantees  can  be 
given  for  weak  and  strong  solutions.  Another  limitation  is  that  the  algorithms  in  order  to 
provide  completeness  perform  a  large  number  of  recomputations  when  building  a  complete 
breadth-first  search  frontier  in  each  iteration.  Such  recomputations  may  be  avoided  due  to 
the  extensive  caching  of  previous  results  by  the  BDD  package.  However,  we  cannot  rely 
on  unstructured  caching  for  memory  intense  problems.  An  interesting  direction  for  future 
work  is  to  define  a  method  for  maintaining  a  complete  search  frontier  instead  of  simply 
recomputing  it. 


5.5  Summary 

In  this  chapter,  we  have  seen  how  the  core  ideas  of  state-set  branching  can  be  applied  to 
non-deterministic  planning.  Pure  heuristic  versions  of  the  weak,  strong  cyclic,  and  strong 
algorithms  have  been  developed  and  formally  proven  to  be  correct.  A  range  of  experimen¬ 
tal  results  in  three  non-deterministic  and  two  deterministic  domains  using  four  different 
heuristics  show  that  the  new  guided  algorithms  may  have  dramatically  better  performance 
than  the  ordinary  blind  algorithms  both  in  terms  of  search  time  and  solution  size. 
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Chapter  6 

Fault  Tolerant  Planning 


In  the  previous  two  chapters,  the  focus  has  been  on  lowering  the  complexity  of  BDD-based 
deterministic  and  non-de  termini  Stic  search.  In  this,  and  the  following  chapter,  we  will  shift 
focus  and  introduce  extensions  to  the  non-deterministic  domain  model  in  order  to  improve 
the  quality  of  the  produced  solutions.  A  key  observation  is  that  non-determinism  in  the  real 
world  often  is  caused  by  infrequent  errors  that  make  otherwise  deterministic  actions  fail. 
In  many  cases,  no  actions  can  be  guaranteed  to  succeed.  For  such  domains,  it  may  be  hard 
or  even  impossible  to  generate  plans  that  can  recover  from  any  combination  of  errors.  In 
this  chapter,  we  propose  a  new  framework  called  fault  tolerant  planning  [96]  to  handle  this 
kind  of  non-determinism. 

Section  6.1  defines  fault  tolerant  planning  domains  and  n-fault  tolerant  plans  that  are 
robust  to  n  failures  occurring  during  the  execution  of  the  plan.  In  Section  6.2,  we  show 
how  optimal  n-fault  tolerant  plans  can  be  generated  with  the  strong  algorithm.  Due  to 
non-local  error  states  it  turns  out,  however,  to  be  hard  to  guide  the  search  efficiently  with 
non-deterministic  state-set  branching  when  using  this  approach.  We  therefore  develop  a 
specialized  guided  1-fault  tolerant  planning  algorithm  1-GFTP  that  decouples  the  guiding 
toward  error  states  and  guiding  toward  the  initial  state.  In  Section  6.3,  we  present  a  range 
of  experimental  results  that  show  that  the  specialized  algorithm  indeed  may  be  necessary 
for  solving  real-world  problems  efficiently.  Finally,  Section  6.4  draws  conclusions  and 
discusses  directions  for  future  work. 


6.1  N-Fault  Tolerant  Planning  Problems 

Fault  tolerant  planning  assumes  that  actions  have  primary  and  secondary  effects.  The  pri¬ 
mary  effect  models  the  usual  deterministic  behavior  of  the  action,  while  the  secondary 


101 


102 


CHAPTER  6.  FAULT  TOLERANT  PLANNING 


effect  models  the  error  effects.  7V-Fault  tolerant  plans  are  robust  to  n  errors  or  faults  occur¬ 
ring  during  the  execution  of  the  plan.  This  definition  of  fault  tolerance  is  closely  connected 
to  fault  tolerance  concepts  in  control  theory  and  engineering.  Every  time  we  board  a  two 
engined  aircraft,  we  enter  a  1 -fault  tolerant  system:  a  single  engine  failure  is  recoverable, 
but  two  engines  failing  may  lead  to  an  unrecoverable  breakdown  of  the  system. 

An  n-fault  tolerant  plan  is  not  as  restrictive  as  a  strong  plan  that  requires  that  the  goal 
can  be  reached  in  a  finite  number  of  steps  independent  of  the  number  of  errors.  In  many 
cases,  a  strong  plan  does  not  exist  because  all  possible  errors  must  be  taken  into  account. 
This  is  not  the  case  for  fault  tolerant  plans,  and  if  errors  are  infrequent,  they  may  still  be 
very  likely  to  succeed.  A  fault  tolerant  plan  is  also  not  as  restrictive  as  a  strong  cyclic  plan. 
An  execution  of  a  strong  cyclic  plan  will  never  reach  states  not  covered  by  the  plan  unless 
it  is  a  goal  state.  Thus,  strong  cyclic  plans  also  have  to  take  all  error  combinations  into 
account.  Weak  plans,  on  the  other  hand,  are  more  relaxed  than  fault  tolerant  plans.  Fault 
tolerant  plans,  however,  are  almost  always  preferable  to  weak  plans  because  they  give  no 
guarantees  for  all  the  possible  outcomes  of  actions.  For  fault  tolerant  plans,  any  action  may 
fail,  but  only  a  limited  number  of  failures  are  recoverable. 

A  fault  tolerant  planning  domain  is  similar  to  a  deterministic  planning  domain.  How¬ 
ever,  in  addition  to  the  primary  effect  of  actions,  we  add  a  secondary  effect  that  describes 
the  outcome  of  a  failure.  Since  an  action  can  often  fail  in  many  different  ways,  we  allow 
the  secondary  effect  to  lead  to  one  of  several  possible  next  states.  Thus,  secondary  effects 
are  non-de  termini  Stic. 

Definition  6.1  (Fault  Tolerant  Planning  Domain)  A  fault  tolerant  planning  domain  is  a 
tuple  { S ,  Act ,  — *,  -w)  where  S  is  a  finite  set  of  states,  Act  is  a  finite  set  of  actions,  — >  C 
S  x  Act  x  S  is  a  deterministic  transition  relation  of  primary  effects,  and  •wCSx  Act  x  S 
is  a  non-deterministic  transition  relation  of  secondary  effects.  Instead  of(s,  a,  s')  G  -»  and 
( s ,  a,  s')  G  -w,  we  write  s  A  s'  and  s  A  s',  respectively. 

The  planning  language  NADF+  described  in  Appendix  A  may  be  used  to  represent  fault 
tolerant  planning  domains.  An  n-fault  tolerant  planning  problem  is  a  deterministic  planning 
problem  extended  with  the  fault  limit  n. 

Definition  6.2  (N-Fault  Tolerant  Planning  Problem)  An  n-fault  tolerant  planning  prob¬ 
lem  is  a  tuple  (V,  s0,  G,  n)  where  V  is  a  fault  tolerant  planning  domain,  s0  G  S  is  an  initial 
state,  G  C  S  is  a  set  of  gocd  states,  and  n  :  N  is  an  upper  bound  on  the  number  of  faults 
the  plan  must  be  able  to  recover  from. 

An  n-fault  tolerant  plan  is  defined  via  a  transformation  of  an  n-fault  tolerant  planning 
problem  to  a  non-deterministic  planning  problem.  The  transformation  adds  a  fault  counter 
/  to  the  state  description  and  models  secondary  effects  only  when  f  <  n. 
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Definition  6.3  (Induced  Non-Deterministic  Planning  Problem)  Let  V  =  (' D,s0,G,n ) 
where  V  =  (S.  Act ,  — *,  he  an  n-fault  tolerant  planning  problem.  The  non-deterministic 
planning  problem  induced  from  V  is  Vnd  =  {' Dnd ,  (s0,  0),  G  x  {0,  •  •  • ,  n })  where  Vnd  = 
(Snd,  Actnd ,  nd )  and  is  given  by 

•  Snd  =  Sx  {0,  •  ■  * ,  n}, 

•  Actnd  =  Act, 

•  <a>  /)  n  w 

-  s  A  s'  and  f  =  f,  or 

-  s  A  s',  f  <  n,  and  f  =  /  +  1. 

Definition  6.4  (Valid  N-Fault  Tolerant  Plan)  A  vtf/zJ  n-fault  tolerant  plan  for  the  n-fault 
tolerant  planing  problem  V  is  a  non-deterministic  plan  tv  for  the  non-deterministic  planning 
problem  induced  from  V  where  A4(n),  srfJ  =  AF  Gnd. 

Thus,  an  n-fault  tolerant  plan  is  valid  if  any  execution  path  where  at  most  n  failures  happen 
eventually  reaches  a  goal  state.  An  n-fault  tolerant  plan  is  optimal  if  it  has  minimum  worst 
case  execution  length. 

Definition  6.5  (Optimal  N-Fault  Tolerant  Plan)  An  optimal  n-fault  tolerant  plan  is  a  va¬ 
lid  n-fault  tolerant  plan  n  where  MAX(sQd,  Gnd ,  7r)  =  SDtST(sQrf,  Gnd ) 


6.2  N-Fault  Tolerant  Planning  Algorithms 

One  might  suggest  using  a  deterministic  planning  algorithm  to  generate  n-fault  tolerant 
plans.  Consider  for  instance  synthesizing  a  1 -fault  tolerant  plan  in  a  domain  where  there 
is  a  non-faulting  plan  of  length  k  and  at  most  /  error  states  of  any  action.  It  is  tempting 
to  claim  that  a  1-fault  tolerant  plan  then  can  be  found  using  at  most  kf  calls  to  a  classical 
deterministic  planning  algorithm.  This  analysis,  however,  is  flawed.  It  only  holds  for 
evaluating  a  given  1-fault  tolerant  plan.  It  neglects  that  many  additional  calls  to  the  classical 
planning  algorithm  may  be  necessary  in  order  to  find  a  valid  solution.  Instead,  we  need 
an  efficient  approach  for  finding  plans  for  many  states  simultaneously.  This  can  be  done 
with  BDD-based  non-deterministic  planning.  We  first  observe  that  it  follows  directly  from 
Definition  3.9  that  the  strong  algorithm  returns  a  valid  n-fault  tolerant  plan,  if  it  exists, 
when  given  the  induced  non-deterministic  planning  problem  as  input.  Moreover,  if  the 
blind  strong  algorithm  is  used  to  generate  the  solution,  it  follows  from  Theorem  3.3  that 
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the  returned  n-fault  tolerant  plan  is  optimal.  Let  n- FTP  s  denote  the  Strong  algorithm 
applied  to  an  n-fault  tolerant  planning  problem.  Since  the  performance  of  blind  strong 
planning  is  limited,  we  also  consider  solving  n-fault  tolerant  planning  problems  with  the 
guided  version  of  strong  planning  defined  in  previous  chapter.  Let  n-GFTPs  denote  the 
GuidedStrong  algorithm  applied  to  an  n-fault  tolerant  planning  problem. 

We  may  expect  n-GFTPs  to  be  efficient  when  secondary  effects  are  local  in  the  state 
space  because  they  then  will  be  covered  by  the  search  beam  of  n-GFTPs.  In  practice, 
however,  secondary  effects  may  be  permanent  malfunctions  that  due  to  their  impact  on  the 
domain  cause  a  transition  to  a  non-local  state.  That  is  a  state  from  which  no  short  path  of 
primary  effects  exists  to  the  source  state.  Indeed,  in  theory,  the  location  of  secondary  effects 
may  be  completely  uncorrelated  with  the  location  of  primary  effects.  To  solve  this  problem, 
we  develop  a  specialized  algorithm  where  the  planning  for  primary  and  secondary  effects 
is  decoupled.  We  constrain  our  investigation  to  1 -fault  tolerant  planning  and  introduce 
two  algorithms:  1-FTP  using  blind  search  and  1-GFTP  using  guided  search.  The  input  to 
these  algorithms  is  a  1 -fault  tolerant  planning  problem,  not  its  induced  non-deterministic 
planning  problem. 

The  1-FTP  algorithm  is  shown  in  Figure  6.1.  The  function  PreImgSA  f  computes  the 

function  l-FTP(s0,G) 

1  F°  «-  0;  C°  «-  G 

2  F1  «-  0;  C1  <-  G 

3  while  s0  ^  C° 

4  /c°  <—  PreImgSA (C°)  \C°  x  Act 

5  ff-  /°  \  PreImgSA /(G1) 

6  while  f°  =  0 

7  Z1  PreImgSA (G1)  \  G1  x  Act 

8  if  f1  =  0  then  return  “no  solution  exists” 

9  F1  <-  F1  U  f1 

10  G1  G1  U  States^1) 

11  f°  fc  \  PreImgSA/ (G1) 

12  F°  <-  F°  U  f° 

13  G°  G°  U  States  (/°) 

14  return  (F°,  F1) 


Figure  6.L  The  f-FTP  algorithm. 


preimage  of  secondary  effects.  1-FTP  returns  two  non-deterministic  plans  F°  and  F1  for 
the  fault  tolerant  domain,  where  F°  is  robust  to  one  fault  while  F1  is  a  recovery  plan. 
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Example  6.1  An  example  of  the  non-deterministic  plans  F°  and  F1  returned  by  1-FTP  is 
shown  in  Figure  6.2  0 


Figure  6.2:  An  example  of  the  non-deterministic  plans  F°  and  F1  returned  by  1- 
FTP.  Primary  and  secondary  effects  of  actions  are  drawn  with  solid  and  dashed  lines, 
respectively.  In  this  example,  we  assume  that  F°  forms  a  sequence  of  actions  from 
the  initial  state  to  a  goal  state,  while  F 1  recovers  all  the  possible  faults  of  actions  in 
F°. 

1-FTP  performs  a  backward  search  from  the  goal  states  that  alternate  between  blindly  ex¬ 
panding  F°  and  F1  such  that  failure  states  of  F°  always  can  be  recovered  by  F 1 .  Initially 
F°  and  F 1  are  assigned  to  empty  plans  (1.  1-2).  The  variables  C°  and  C 1  are  states  cov¬ 
ered  by  the  current  plans  in  F°  and  F1.  They  are  initialized  to  the  goal  states  since  these 
states  are  covered  by  zero  length  plans.  In  each  iteration  of  the  outer  loop  (1.  3-13),  F°  is 
expanded  with  SAs  in  f°  (1.  12-13).  First,  a  candidate  /(°  is  computed.  It  is  the  preimage 
of  the  states  in  F°  pruned  for  SAs  of  states  already  covered  by  F°  (1.  4).  The  variable 
f°  is  assigned  to  /°  restricted  to  SAs  for  which  all  error  states  are  covered  by  the  current 
recovery  plan  (1.  5).  If  f°  is  empty  the  recovery  plan  is  expanded  in  the  inner  loop  until  f° 
is  nonempty  (1.  6-11).  If  the  recovery  plan  at  some  point  has  reached  a  fixed  point  and  f°  is 
still  empty,  the  algorithm  terminates  with  failure,  since  in  this  case,  no  recovery  plan  exists 
(1.  8).  We  claim  without  proof  that  1-FTP  is  sound,  complete,  and  terminating. 

1-FTP  expands  both  F°  and  F1  blindly.  An  inherent  strategy  of  the  algorithm,  though, 
is  not  to  expand  F 1  more  than  necessary  to  recover  the  faults  of  F°.  This  is  not  the  case 
for  n-FTPs  that  does  not  distinguish  states  with  different  number  of  faults.  The  aggressive 
strategy  of  1-FTP,  however,  makes  it  suboptimal  as  the  example  in  Figure  6.3  shows.  In 
the  first  two  iterations  of  the  outer  loop,  (p2,  b)  and  (pi,  b)  are  added  to  F°  and  nothing  is 
added  to  F1.  In  the  third  iteration  of  the  outer  loop,  F1  is  extended  with  (p2,  b }  and  (q2.  a) 
and  F°  is  extended  with  {q2,  a).  In  the  last  two  iterations  of  the  outer  loop,  { q2,a )  and 
(s0,  a)  are  added  to  F°.  The  resulting  plan  is 

F°  =  {(s0,a),(qi,a),(q2,a),{p1,b),(p2,b)} 
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a 


b 


Figure  6.3:  A  problem  with  a  single  goal  state  g  showing  that  1-FTP  may  return 
suboptimal  solutions.  Dashed  lines  indicate  secondary  effects.  Notice  that  action  a 
and  b  only  have  secondary  effects  in  q 2  and  s 0,  respectively.  In  all  other  states,  the 
actions  are  assumed  always  to  succeed. 


F 1  =  {(p2,&),  (q2,a)}. 

The  worst  case  length  of  this  1 -fault  tolerant  plan  is  4.  However,  a  1 -fault  tolerant  plan 

F°  =  {(s0,b),(p  i,b),{p2,b)} 

F1  =  {(qi,a),{q2,a)} 

with  worst  case  length  of  3  exists. 

Despite  the  different  search  strategies  applied  by  1-FTP  and  I-FTP5,  they  both  perform 
blind  search.  A  more  interesting  algorithm  is  a  guided  version  of  1-FTP  called  1-GFTP 
based  on  the  non-deterministic  state-set  branching  framework  introduced  in  previous  chap¬ 
ter.  The  over  all  design  goal  of  1-GFTP  is  to  guide  the  expansion  of  F°  toward  the  initial 
state  and  guide  the  expansion  of  F1  toward  the  failure  states  of  F°.  However,  this  can  be 
accomplished  in  many  different  ways.  Below  we  evaluate  three  different  strategies.  For 
each  algorithm,  F°  is  guided  in  a  pure  heuristic  manner  toward  the  initial  state  using  the 
approach  employed  by  n-GFTPs. 

The  first  strategy  is  to  assume  that  failure  states  are  local  and  guide  F1  toward  the  initial 
state  as  well.  The  resulting  algorithm  is  similar  to  I-GFTP5  and  has  poor  performance.  The 
problem  is  that  the  pure  heuristic  approach  causes  F1  only  to  cover  a  narrow  beam  of  states 
in  the  state  space.  Error  states  not  within  close  distance  to  the  primary  effects  tend  not  to  be 
covered  by  F1 .  The  strategy  can  be  improved  by  widening  the  beam  by  taking  the  search 
depth  into  account.  However,  this  does  not  provide  a  satisfactory  solution  for  non-local 
states. 

The  second  strategy  is  ideal  in  the  sense  that  it  dynamically  guides  the  expansion  of  F1 
toward  error  states  of  the  precomponents  of  F°.  This  can  be  done  by  using  a  specialized 
BDD  operation  that  splits  the  precomponent  of  F1  according  to  the  Hamming  distance  to 
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the  error  states.  The  complexity  of  this  operation,  however,  is  exponential  in  the  size  of  the 
BDD  representing  the  error  states  and  the  size  of  the  BDD  representing  the  precomponent 
of  F°.  Due  to  the  dynamic  programming  used  by  the  BDD  package,  the  average  complexity 
may  be  much  lower.  However,  this  does  not  seem  to  be  the  case  in  practice. 

The  third  strategy  is  the  one  chosen  for  1-GFTP.  It  expands  F 1  blindly,  but  then  prunes 
SAs  from  the  precomponent  of  F 1  not  used  to  recover  error  states  of  F°.  Thus,  it  uses  an 
indirect  approach  to  guide  the  expansion  of  F1.  We  expect  this  strategy  to  work  well  even 
if  the  absolute  position  of  error  states  is  non-local.  However,  the  strategy  assumes  that  the 
relative  position  of  error  states  is  local  in  the  sense  that  the  SAs  in  F1  in  expansion  i  of  F° 
are  relevant  for  recovering  error  states  in  expansion  *  +  1  of  F°.  In  addition,  we  still  have 
an  essential  problem  to  solve:  to  expand  F°  or  F1.  There  are  two  extremes. 

1.  Expand  F1  until  first  recovery  of  f°.  Compute  a  complete  partitioned  backward 
precomponent  of  F°,  expand  F 1  until  some  partition  in  /°  has  recovered  error  states 
and  add  the  partition  with  least  h- value  to  F°. 

2.  Expand  F 1  until  best  recovery  of  f°.  Compute  a  complete  partitioned  backward 
precomponent  of  F°,  expand  F1  until  the  partition  of  /°  with  lowest  h- value  has 
recovered  error  states  and  add  this  partition  to  F°.  If  none  of  these  error  states  can 
be  recovered  then  consider  the  partition  with  second  lowest  h- value  and  so  on. 

It  turns  out  that  neither  of  these  extremes  work  well  in  practice.  The  first  is  too  conservative. 
It  may  add  a  partition  with  a  high  h- value  even  though  a  partition  with  a  low  //-value  can 
be  recovered  given  just  a  few  more  expansions  of  F1.  The  second  strategy  is  too  greedy. 
It  ignores  the  complexity  of  expanding  F1  in  order  to  recover  error  states  of  the  partition 
of  /°  with  lowest  /i-value.  Instead,  we  consider  a  mixed  strategy:  spend  half  of  the  last 
expansion  time  on  recovering  error  states  of  the  partition  of  /°  with  lowest  //-value  and, 
in  case  this  is  impossible,  spend  one  fourth  of  the  last  expansion  time  on  recovering  error 
states  of  the  partition  of  /°  with  second  lowest  /i-value,  and  so  on. 

The  1-GFTP  algorithm  is  shown  in  Figure  6.4.  The  keys  in  maps  are  sorted  ascend- 
ingly.  The  instantiation  of  F°  and  F1  of  1-GFTP  is  similar  to  1-FTP  except  that  the 
states  in  C°  are  partitioned  with  respect  to  their  associated  //-value.  Initially  the  map  entry, 
C °[hgoai\  is  assigned  to  the  goal  states.  1  The  variable  t  stores  the  duration  of  the  previ¬ 
ous  expansion.  Initially,  it  is  given  a  small  value  e.  In  each  iteration  of  the  main  loop 
(1.  4-22),  the  precomponents  /°  and  f1  are  computed  and  added  to  F°  and  F 1 .  First,  the 
start  time  ts  is  logged  by  reading  the  current  time  tcpu  (k  5).  Then  a  map  PC  holding 
a  complete  partitioned  precomponent  candidate  of  F°  is  computed  by  PreCompFTP  (1. 

'To  simplify  the  presentation,  we  assume  that  all  goal  states  have  identical  /i-value.  A  generalization  of 
the  algorithm  is  trivial. 
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function  l-GFTP(s0,  G) 

1  F°  f-  0;  C °[hg]  «-  G 

2  F1  <-  0;  C1  «-  G 

3  t  i —  e 

4  while  s0  G° 

5  ts  4—  t-cpu 

6  PC  4-  PreCompFTP  (C°) 

7  /°  ^  0;  /c°  «-  0 

8  f 2  f—  emptyMap 

9  if-  0 

10  while  /°  =  0  A  i  <  PC | 

11  i  f —  i  -|- 1;  t  4 —  t/2 

12  /c°  <-  /c°  U  PC[i] 

13  (f^/°)  <-  ExpandTimed (/° , , G1 , t) 

14  if  /°  =  0  then 

15  <fc\  /°)  f-  ExpandTimed  (/°,  fc\  G1,  oo) 

16  t  4—  tcpu  ~  ts 

17  if  /°  =  0  then  return  “no  solution  exists” 

18  f1  4—  PRUNEUNUSED(fc\  f°) 

19  F1  f-  F1  U  Z1;  G1  f-  G1  U  States  (Z1) 

20  F°  f—  F°  U  Z° 

21  for  j  =  1  to  i 

22  C%]  «-  C°[^]  u  States (Z°  n  PC[fy]) 

23  return  (F°,  F1) 


Figure  6.4:  The  1-GFTP  algorithm. 


6).  For  each  entry  in  C°,  PreCompFTP  inserts  the  preimage  in  PC  of  each  partition  of  a 
disjunctive  branching  partitioning  of  the  transition  relation  of  primary  effects.  We  assume 
that  this  partitioning  has  m  subrelations  Ri,  ■  ■  •  Rm  where  the  transitions  represented  by 
Ri  are  associated  with  a  change  ()h,  of  the  /i- value  (in  forward  direction).  The  inner  loop 
(1.  10-13)  of  1-GFTP  expands  the  two  candidates  /)°  and  f):  for  f°  and  f1.  In  each  itera¬ 
tion,  a  partition  of  the  partitioned  precomponent  PC  is  added  to  /)°  (1.  12). 2  The  function 
ExpandTimed  expands  f In  iteration  i,  the  time  out  bound  of  the  expansion  is  tf  2*. 
ExpandTimed  returns  early  if 

2Recall  that  PC  is  traversed  ascendingly  such  that  the  partition  with  lowest  /i-value  is  added  first. 
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1 .  a  precomponent  /°  in  the  candidate  /°  is  found  where  all  error  states  are  recovered 
(1.  5  and  1.  11),  or 

2.  fl  has  reached  a  fixed  point. 

The  preimage  added  to  fl  in  iteration  i  of  ExpandTimed  is  stored  in  the  map  entry  f*[i] 
in  order  to  prune  SAs  not  used  for  recovery. 

function  PreCompFTP(C°) 

1  PC  A-  empty  Map 

2  for  i  =  1  to  |  C°  | 

3  for  j  =  1  to  m 

4  SA  PreImgSA^C0^])  \  C°  x  Act 

5  PC  [hi  -  5hj]  «-  PC  [hi  -  5hj]  U  SA 

6  return  PC 

Eventually  /°  may  contain  all  the  SAs  in  PC  without  any  of  these  being  recoverable.  In 
this  case  1-GFTP  expands  fl  (1.  15)  untimed. 

function  ExpandTimed  (/c°,  fr[.  C1,  t) 

1  ts  <—  t-cPU 

2  Oldfl  «-  J_ 

3  i  <—  |fcl 

4  recovS  <—  States  (ft)  U  C 1 

5  f  «-  /c°  \  PRElMGSA/(recow5') 

6  while  /°  =  0  A  Oldfl  ^  fl  A  tcpu  ~ts  <t 

7  Oldfl  4-  f] 

8  *  <—  i  +  1 

9  il[i\  PreImgS  A  (recovS)  \  recovS  x  Act 

10  recovS  •<—  States (/cx)  U  C1 

11  f°<~fc\  PRElMG/ (recovS) 

12  return  (fc:,  /°) 

If  fl  has  reached  a  fixed  point  but  no  recoverable  precomponent  f°  exists,  no  1 -fault  tol¬ 
erant  plan  exists  and  1-GFTP  returns  with  “no  solution  exists”  (1.  17).  Otherwise,  fl  is 
pruned  for  SAs  of  states  not  used  to  recover  the  SAs  in  /°  (1.  18).  This  pruning  is  com¬ 
puted  by  PruneUnused  that  traverses  backward  through  the  preimages  of  f I  and  marks 
states  that  either  are  error  states  of  SAs  in  /°,  or  states  needed  to  recover  previously  marked 
states. 
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function  PruneUnused^1,  /°) 

1  err  4- SAimg/(/°) 

2  img  4—  0;  marked  <r-  0 

3  for  i  =  If^l  to  1 

4  f^[i]  <—  il[i\  fl  ((err  U  img )  x  Act) 

5  marked  marked  U  STATES  (^[i]) 

6  img  <—  SAlMG(fc1[t]) 

7  return  /*  fl  ( marked  x  Act) 


The  function  SAimg(7t)  and  SAlMG/(-7r)  computes  the  image  states  of  a  set  of  SAs  tt  for 
primary  and  secondary  effects  respectively. 

SAimg(7t)  =  {s'  :  3(s,  a)  E  tt  .  s  -A  s'}  (6.1) 

SAlMG/(7r)  =  {s'  :  3(s,  a)  G  n  .  s  A  s'}  (6.2) 

The  updating  of  F°  and  F1  of  1-GFTP  (1.  19-22)  is  similar  to  1-FTP,  except  that  C°  is 

updated  by  iterating  over  PC  and  picking  SAs  in  /°.  Notice  that  in  this  iteration  hj  refers  to 

the  keys  of  PC.  We  claim  without  proof  that  1-GFTP  is  sound,  complete,  and  terminating. 

The  specialized  algorithms  can  be  generalized  to  n  faults  by  adding  more  recovery  plans 
Fn .  Fn~l,  •  •  • ,  F°.  For  n-GFTP  all  of  these  recovery  plans  would  be  indirectly  guided  by 
the  expansion  of  Fn.  The  algorithm  is  illustrated  in  Figure  6.5 


6.3  Experimental  Evaluation 

The  purpose  of  the  experimental  evaluation  is  not  only  to  compare  the  performance  of  the 
developed  algorithms,  but  also  to  investigate  the  properties  of  fault  tolerant  planning  in 
significant  real-world  domains.  The  algorithms  1-FTP,  1-GFTP,  I-FTP5,  and  I-GFTP5 
have  been  implemented  in  the  BIFROST  0.7  search  engine.  All  experiments  have  been 
carried  out  in  the  experimental  setting  described  in  Appendix  A.  As  usual,  we  represent 
the  parameter  setting  of  the  BuDDy  package  by  the  number  of  allocated  BDD  nodes  in  the 
unique  table  (n)  and  the  number  of  allocated  BDD  nodes  in  the  operator  caches  (c).  Time 
is  measured  in  seconds  and  the  size  of  a  BDDs  is  measured  in  number  of  BDD  nodes. 


6.3.1  Unguided  Search 

We  first  focus  on  unguided  search  and  study  four  fault  tolerant  planning  domains.  Two  of 
these,  DS1  and  PSR,  are  models  of  real-world  domains. 
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Figure  6.5:  An  example  of  Fn,  ■  ■  ■ .  F°  produced  by  a  specialized  ro-fault  tolerant 
planning  algorithm.  Primary  and  secondary  effects  of  actions  are  drawn  with  solid 
and  dashed  lines,  respectively. 


DS1 

DS1  is  based  on  an  SMV  encoding  [131]  of  the  Livingstone  model  [173]  used  by  the  Re¬ 
mote  Agent  for  NASA’s  Deep  Space  One  probe.  The  Livingstone  model  describes  the 
electrical  system  of  the  spacecraft.  It  consists  of  a  system  bus  and  a  number  of  units  con¬ 
nected  to  the  bus.  These  units  include  a  power  distribution  subsystem,  a  Ion  Propulsion 
System  (IPS),  Propulsion  Drive  Electronics  (PDE),  a  Reaction  Control  System  (RCS),  At¬ 
titude  Control  System  (ACS),  Star  Tracker  Unit  (SRU),  and  a  MICAS  camera.  We  recast 
the  SMV  encoding  as  a  fault  tolerant  planning  problem  in  NADL+.  Each  bus-command 
is  an  action.  The  primary  effect  of  the  command  is  the  changes  it  causes  on  the  electrical 
system  given  that  all  units  work  correct.  The  secondary  effect  of  an  action  is  one  of  the  two 
faults  F2  and  F4  considered  in  the  Remote  Agent  Experiment  [124]. 

F2  :  camera  or  pasm  switch  is  recoverably  stuck  on/off. 

F4  :  an  x-z  thruster  valve  is  permanently  stuck  closed. 


In  addition  to  these  two  faults,  the  Remote  Agent  Experiment  considered  two  other  errors. 
We  are  not  modeling  these  since  no  1 -fault  tolerant  plan  exists  when  taking  all  four  faults 
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into  account.  The  following  simplifications  have  been  made  in  the  NADL+  model  of  the 
SMV  description 

1.  we  assume  that  the  state  of  components  is  known, 

2.  attitude  errors  are  assumed  to  be  deterministically  computable, 

3.  relative  thrust  is  assumed  to  be  low  or  nominal  if  a  valve  is  stuck  otherwise  nominal, 

4.  redundant  state  variables  in  the  SMV  model  have  been  removed.  3 


The  NADL+  encoding  of  the  domain  has  84  Boolean  state  variables.  We  consider  generat¬ 
ing  a  1 -fault  tolerant  plan  from  an  initial  state  where  the  IPS  is  in  standby  mode,  the  MICAS 
camera  is  “off”,  and  the  pasm  switch  is  “on”.  The  goal  is  to  reach  a  state  where  the  IPS 
is  in  thrusting  mode,  the  MICAS  camera  is  “on”,  and  the  pasm  switch  is  “off”.  The  BDD 
package  parameters  are  n  =  1M  and  c  =  100K.  The  threshold  for  merging  partitions  of  a 
disjunctive  transition  relation  partitioning  is  5000.  The  total  size  of  the  transition  relation  is 
104881  and  is  computed  in  0.42  seconds.  The  size  of  the  solution  is  535  and  the  total  CPU 
time  is  1.15  seconds.  The  experiment  shows  that  a  BDD  encoding  is  very  efficient  for  the 
kind  of  constraints  modeled  by  DS1.  Despite  a  fairly  large  and  dense  model,  a  disjunctive 
transition  relation  is  fast  to  compute.  In  addition,  a  1 -fault  tolerant  plan  for  a  non-trivial 
problem  in  this  domain  is  small  and  can  be  generated  in  less  than  a  second.  The  experiment 
demonstrates  that  BDD-based  fault  tolerant  planning  is  mature  to  be  applied  on  significant 
real-world  problems.  An  important  lesson  to  leam  from  the  investigation  of  DS1  is  that 
even  1-fault  tolerance  imposes  a  strong  restriction  on  a  physical  system.  No  1-fault  tolerant 
plan  exists  for  the  problem  if  all  of  the  original  four  failures  are  considered. 


CB  o 
CB^ 

cbM 


k  fsD0 
h 

~2  JSDi 


SD 


n— 1 


Figure  6.6:  The  linear  PSR  domain. 


3 An  automatic  approach  for  doing  this  has  been  developed  in  [178]. 
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PSR 

The  PSR  domain  is  described  in  Section  5.3.1.  The  primary  effect  of  the  Open  and  Close 
actions  on  switches  is  that  the  switches  open  and  close  accordingly.  The  secondary  effect 
is  that  they  break  and  get  stuck  in  their  current  position.  We  compare  the  performance 
of  1-FTP  and  1-FTP5  in  two  versions  of  the  domain.  The  first,  is  the  “simple”  domain 
described  Section  5.3.1.  In  the  initial  state,  all  switches  are  open  and  the  goal  is  to  feed  all 
lines.  1-FTP  and  I-FTP5  solve  this  problem  in  6.8  and  11.25  seconds,  respectively  (0.98 
seconds  is  used  on  memory  allocation,  n  =  1M  and  c  =  700K). 

The  second  version  of  the  domain  is  the  linear  network  shown  in  Figure  6.6.  Again, 
the  initial  state  is  that  all  switches  are  open,  and  the  goal  is  to  feed  all  lines.  The  result  are 
shown  in  Figure  6.7.  The  BDD  package  parameters  are  n  =  15 M  and  c  =  500 K  and  3.38 


Figure  6.7:  Results  of  the  PSR  problems. 
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seconds  are  used  on  memory  allocation.  1-FTP  performs  significantly  better  than  I-FTP5 
on  this  problem.  Interestingly,  the  performance  difference  is  not  reflected  by  the  plan  sizes. 
However,  this  may  be  an  artifact  caused  by  the  fact  that  the  plan  size  for  1-FTP  is  a  sum  of 
the  size  of  two  BDDs,  while  the  plan  size  for  I-FTP5  is  the  size  of  a  single  BDD.  Similarly 
to  the  DS1  domain,  1-fault  tolerance  imposes  a  strong  constraint  on  the  PSR  domain.  For 
most  configurations,  where  a  few  units  already  have  failed,  no  1-fault  tolerant  plan  exists. 

Power  Plant 

The  power  plant  domain  is  shown  in  Figure  6.8  and  originates  in  [93].  The  task  is  to  execute 


m2 


Figure  6.8:  The  power  plant  domain.  An  open  valve  is  drawn  solid  and  allows 
water  or  steam  to  fbw  through  it.  In  the  depicted  state,  a  failure  of  heat  exchanger  1 
is  assumed  just  to  have  happened. 

the  correct  control  actions  in  order  to  bring  the  plant  from  some  bad  state,  where  the  plant 
is  unsafe  or  not  working  properly,  to  some  good  state,  where  the  plant  satisfies  its  safety 
and  activity  requirements.  A  single  reactor  R  is  surrounded  by  four  heat  exchangers  HI, 
H2,  H3  and  H4.  The  heat  exchangers  produce  high  pressure  steam  to  the  four  electricity 
generating  turbines  Tl,  T2,  T3  and  T4.  The  heat  exchangers  can  fail  and  leak  radioactive 
substances  from  the  internal  water  loop  to  the  external  steam  loop.  If  this  happens,  the 
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Figure  6.9:  Results  of  the  power  plant  experiment.  The  total  CPU  time  and  plan 
size  is  given  by  ttotai  and  \sol\.  respectively.  The  size  of  the  problem  is  the  number  of 
Boolean  state  variables. 


blocking  valve  (al,  a2,  a3  or  a4)  of  the  heat  exchanger  must  be  closed.  However,  these 
valves  can  fail  too,  in  which  case  the  valves  m2,  m3  or  ml  are  used.  Similarly,  if  turbines 
fail,  they  must  be  shut  down  by  closing  one  of  the  valves  61,  62,  63  or  64,  or  m4,  m5  and 
ml .  The  energy  production  p  of  the  plant  can  either  be  0, 1 ,2,3  or  4  units  of  energy  per  time 
unit.  The  production  must  be  adjusted  to  fit  the  demand  /,  if  possible.  A  heat  exchanger 
can  only  transfer  enough  energy  to  a  single  turbine,  and  a  single  turbine  can  only  produce 
one  unit  of  energy  per  time  unit.  The  initial  state  is  shown  in  Figure  6.8.  A  failure  of  heat 
exchanger  1  is  a  assumed  to  have  just  happened. 

We  compare  the  performance  of  1-FTP  and  I-FTP5  in  two  versions  of  the  domain. 
The  first  considers  controlling  a  single  power  plant.  The  second  considers  controlling  two 
power  plants  simultaneously.  The  results  are  shown  in  Figure  6.9.  In  both  experiments,  the 
parameters  of  the  BDD  package  are  n  =  15 M  and  c  =  500A.  The  time  spent  on  memory 
allocation  is  3.4  seconds.  1-FTP  has  a  slightly  better  performance  than  I-FTP5.  However, 
both  algorithms  suffer  from  a  large  growth  rate  of  the  BDDs  representing  the  frontier  of  the 
backward  search.  Again,  1-fault  tolerant  plans  turns  out  to  be  hard  to  generate.  Even  though 
the  system  is  highly  redundant,  1 -fault  tolerant  plans  only  exist  for  simple  malfunctions  like 
the  one  investigated  in  this  experiment. 

Beam  Walk 
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Figure  6.10:  The  Beam  Walk  domain.  Solid  edges  denote  primary  effects  of  the 
move  action,  while  dashed  edges  denote  secondary  effects. 


The  Beam  Walk  domain  was  introduced  in  [36]  and  considers  a  robot  walking  on  a 
beam.  The  primary  effect  of  the  move  action  is  that  the  robot  moves  one  step  forward 
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on  the  beam.  The  secondary  effect  is  that  it  falls  down  from  the  beam.  The  domain  is 
shown  in  Figure  6.10.  The  Beam  Walk  domain  represents  a  worst  case  scenario  for  1-FTP 
and  I-FTP5  since  a  fault  in  the  last  step  to  reach  the  goal  causes  a  transition  to  the  state 
furthest  away  from  the  goal.  Both  algorithms  must  iterate  over  all  states  before  a  solution 
is  found.  The  results  are  shown  in  Figure  6.1 1.  As  expected,  both  algorithms  have  a  limited 


Figure  6.11:  Results  of  the  BeamWalk  experiments. 


performance  in  this  domain.  Again,  however,  we  observe  a  slightly  better  performance  of 
1-FTP. 

6.3.2  Guided  Search 

The  main  purpose  of  the  experiments  in  this  section  is  to  study  the  difference  between  1- 
GFTP  and  I-GFTP5.  In  particular,  we  are  interested  in  investigating  how  sensitive  these 
algorithms  are  to  non-local  error  states  and  to  what  extent  we  may  expect  this  to  be  a 
problem  in  practice.  We  study  3  domains,  of  which  SIDMAR  descends  from  a  real-world 
study. 

LV 

The  LV  domain  is  an  artificial  domain  and  has  been  designed  to  demonstrate  the  different 
properties  of  1-GFTP  and  I-GFTP5.  It  is  an  m  x  m  grid  world  with  initial  state  (0,  m  —  1) 
and  goal  state  (|_m/2j,  |_m/2_|).  The  actions  are  Up,  Down,  Left,  and  Right.  Above  the 
y  =  x  line,  actions  may  fail  causing  the  x  and  y  position  to  be  swapped.  Thus,  error  states 
are  mirrored  in  the  y  =  x  line.  A  9  x  9  instance  of  the  problem  is  shown  in  Figure  6.12. 
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The  essential  property  is  that  error  states  are  non-local,  but  that  two  states  close  to  each 


So 


Figure  6.12:  The  9x9  instance  of  the  LV  domain. 

other  also  have  error  states  close  to  each  other.  This  is  the  assumption  made  by  1-GFTP, 
but  not  I-GFTP5  that  requires  error  states  to  be  local.  The  heuristic  value  of  a  state  is  the 
Manhattan  distance  to  the  initial  state.  The  BDD  package  parameters  are  n  =  5 M  and 
c  =  50076.  Memory  allocation  takes  1.4  seconds.  The  results  are  shown  in  Figure  6.13.  As 


Vertical  and  Horizontal  Board  Dimension 


Figure  6.13:  Results  of  the  LV  experiments. 

depicted,  the  performance  of  I-GFTP5  degrades  very  fast  with  m  due  to  the  misguidance 
of  the  heuristic  for  the  recovery  part  of  the  plan.  Its  total  CPU  time  is  more  than  500 
seconds  after  the  first  three  experiments.  I-GFTP5  is  fairly  unaffected  by  the  error  states. 
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To  explain  this,  consider  how  the  backward  search  proceeds  from  the  goal  state.  The  guided 
precomponents  of  F°  will  cause  this  plan  to  beam  out  toward  the  initial  state.  Due  to  the 
relative  locality  of  error  states,  the  pruning  of  F 1  will  cause  Fl  to  beam  out  in  the  opposite 
direction.  Thus,  both  F°  and  F1  remain  small  during  the  search. 


8-Puzzle 

The  8-Puzzle  further  demonstrates  this  difference  between  1-GFTP  and  I-GFTP5.  We 
consider  a  non-deterministic  version  of  the  8-Puzzle  where  the  secondary  effects  are  self 
loops.  Thus,  error  states  are  the  most  local  possible.  We  use  the  usual  sum  of  Manhat¬ 
tan  distances  of  tiles  as  an  heuristic  for  the  distance  to  the  initial  state.  The  experiment 
compares  the  performance  of  1-FTP,  1-GFTP,  I-FTP5,  and  I-GFTP5.  The  BDD  pack¬ 
age  parameters  are  n  =  1 M  and  c  =  100 if.  Memory  allocation  takes  0.29  seconds.  The 
number  of  Boolean  state  variables  is  35  in  all  experiments.  The  results  are  shown  in  Fig¬ 
ure  6.14.  Again,  1-FTP  performs  substantially  better  than  I-FTP5.  The  guided  algorithms 
1-GFTP  and  I-GFTP5  have  much  better  performance  than  the  unguided  algorithms.  Due 
to  local  error  states,  however,  there  is  no  substantial  performance  difference  between  these 
two  algorithms.  As  depicted,  1-FTP  is  slightly  faster  than  I-GFTP5  in  the  experiment  with 
a  minimum  deterministic  solution  length  of  14.  For  such  small  problems,  we  may  expect  to 
see  this  since  1-FTP  only  expands  the  recovery  plan  when  needed  while  1-GFTP 5  expands 
the  recovery  part  of  its  plan  in  each  iteration. 

SIDMAR 

The  final  experiments  are  on  the  SIDMAR  domain  introduced  in  Section  5.3.1.  The  purpose 
of  these  experiments  is  to  study  the  robustness  of  1-GFTP  and  I-GFTP5  to  the  kind  of 
errors  found  in  real-world  domains.  The  primary  effects  of  actions  are  to  move,  lift  and 
perform  treatments  of  ladles  on  machines.  The  secondary  effects  are  that  machines  break 
permanently  and  moves  fail.  We  consider  casting  two  ladles  of  steel.  The  heuristic  is 
the  sum  of  machine  treatments  carried  out  on  the  ladles.  The  experiment  compares  the 
performance  of  1-FTP,  1-GFTP,  I-FTP5,  and  I-GFTP5.  The  BDD  package  parameters 
are  n  =  5 M  and  c  =  500 AT.  Memory  allocation  takes  1.41  seconds.  The  number  of 
Boolean  state  variables  is  47  in  all  experiments.  The  results  are  shown  in  Figure  6.15. 
Missing  data  points  indicates  that  the  associated  algorithm  spent  more  than  500  seconds 
trying  to  solve  the  problem.  The  only  algorithm  with  good  performance  is  1-GFTP.  The 
experiment  indicates  that  real-world  domains  may  have  non-local  error  states  that  limits  the 
performance  of  1-GFTP s.  Also  notice  that  this  is  the  only  domain  where  1-FTP  does  not 
outperform  I-FTP5.  In  this  domain,  1-FTP  seems  to  be  finding  complex  plans  that  fulfills 
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Figure  6.14:  Results  of  the  8-Puzzle  experiments. 


that  the  recovery  plan  is  minimum.  Thus,  the  strategy  of  1-FTP  to  keep  the  recovery  plan 
as  small  as  possible  does  not  seem  to  be  an  advantage  in  general. 


6.4  Conclusion 

The  experimental  evaluation  shows  that  1-GFTP  consistently  outperforms  its  strong  al¬ 
gorithm  counter  part  I-GFTP5  and  in  particular  is  robust  to  non-local  error  states.  Our 
investigation  of  real-world  domains  suggests  that  such  error  states  exist  and  are  caused 
by  permanent  failures.  Despite  the  blind  search  of  1-FTP,  it  often  outperforms  its  strong 
algorithm  counter  part  I-FTP5  since  it  may  avoid  producing  large  recovery  plans. 

The  experimental  evaluation  of  DS1,  Power  Plant,  and  PSR  further  shows  that  1 -fault 
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Figure  6.15:  Results  of  the  SIDMAR  experiments. 


tolerant  plans  often  do  not  exist  even  for  highly  redundant  physical  systems.  This  suggests 
that  a  fruitful  direction  for  future  work  is  to  define  classes  of  fault  tolerant  plans  that  are 
more  relaxed  than  1 -fault  tolerant  plans.  Another  direction  of  work  is  to  consider  fault 
tolerant  plans  that  are  adjusted  to  the  likelihood  of  faults.  The  more  likely  a  fault  is,  the 
more  robust  the  fault  tolerant  plan  should  be  for  it.  Finally,  it  seems  fairly  simple  to  allow 
non-deterministic  primary  effects.  In  this  case  the  strong  precomponent  would  be  natural 
to  use  to  expand  the  nonfaulting  and  recovery  part  of  the  plan. 
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6.5  Summary 

In  this  chapter,  we  have  introduced  a  new  class  of  non-deterministic  plans  called  fault  tol¬ 
erant  plans.  Fault  tolerant  plans  address  domains  where  non-determinism  is  caused  by 
infrequent  errors.  For  such  domains  strong  and  strong  cyclic  solutions  seldom  exist  since 
any  action  may  fail.  Fault  tolerant  plans  relax  this  problem  by  being  robust  only  to  a 
limited  number  of  faults  occurring  during  execution.  Fault  tolerant  plans  can  be  synthe¬ 
sized  with  the  strong  algorithm  by  reducing  a  fault  tolerant  planning  problem  to  a  non- 
deterministic  planning  problem.  However,  due  to  non-local  error  states,  we  introduce  a 
specialized  guided  algorithm  called  1-GFTP  that  decouples  the  guiding  of  the  fault  tolerant 
and  recovery  part  of  the  plan.  The  experimental  evaluation  indicates  that  this  decoupling 
may  be  very  helpful  for  obtaining  good  performance  in  real-world  domains. 
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Chapter  7 

Adversarial  Planning 


In  the  previous  chapter,  we  identified  faults  as  a  major  source  of  non-determinism  in  real- 
world  domains.  In  this  chapter,  we  introduce  a  new  framework  called  adversarial  planning 
[94]  to  address  domains  where  non-determinism  is  caused  by  simultaneous  actions  of  a 
controllable  system  agent  and  an  uncontrollable  and  possibly  hostile  environment  agent. 
Each  state  is  associated  with  a  set  of  actions  that  are  applicable  by  the  system  agent  and 
a  set  of  actions  that  are  applicable  by  the  environment  agent.  In  each  execution  step,  the 
two  agents  select  one  of  their  applicable  actions.  They  have  no  knowledge  about  the  action 
selected  by  the  other  agent.  The  two  actions  form  a  joint  action  that  causes  a  transition  to  a 
new  state. 

We  begin  our  description  of  adversarial  planning  in  Section  7.1  by  modifying  the  non- 
deterministic  domain  model  introduced  in  Section  3.2  to  represent  system  and  environ¬ 
ment  actions.  We  then  demonstrate  that  for  these  domains  there  exist  plans  that  are  more 
powerful  than  weak  and  strong  cyclic  plans.  These  adversarial  plans  can  be  generated  by 
reasoning  explicitly  about  environmental  actions.  In  Section  7.2,  we  introduce  two  new 
algorithms  for  synthesizing  weak  adversarial  plans  and  strong  cyclic  adversarial  plans. 
We  prove  that,  in  contrast  to  strong  cyclic  plans,  strong  cyclic  adversarial  plans  guarantee 
goal  achievement  independent  of  the  environment  behavior  if  actions  are  selected  randomly 
from  the  plan.  Similarly,  we  prove  that  given  actions  are  selected  randomly  from  the  plan, 
weak  adversarial  plans  improve  the  quality  of  weak  plans  by  guaranteeing  that  there  is  a 
non-zero  probability  of  reaching  a  goal  state  independent  of  the  behavior  of  the  environ¬ 
ment.  In  Section  7.4,  the  algorithms  are  evaluated  experimentally  both  in  terms  of  their 
computational  efficiency  and  in  terms  of  the  quality  of  the  produced  plans.  Finally,  we 
draw  conclusions  in  Section  7.5. 
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7.1  Adversarial  Planning  Problems 

An  adversarial  planning  domain  has  two  active  agents:  a  system  and  an  environment.  The 
task  is  to  construct  plans  for  the  system  in  order  for  it  to  achieve  a  goal.  The  environment 
may  be  an  intelligent  adversary  (or  it  may  simply  be  an  advantage  to  assume  that)  who  is 
fully  informed  about  the  structure  of  the  domain  and  the  limitations  of  the  system’s  ability 
to  construct  plans.1 

An  adversarial  planning  domain  is  a  non-deterministic  planning  domain  with  a  set  of 
controllable  system  actions  and  a  set  of  uncontrollable  environment  actions.  System  and 
environment  actions  are  synchronous.  The  transition  relation  of  the  domain  describes  the 
effects  of  joint  system  and  environment  actions.  The  transition  relation  is  deterministic  to 
reflect  that  the  only  source  of  non-determinism  is  uncontrollable  environment  actions. 

Definition  7.1  (Adversarial  Planning  Domain)  An  adversarial  planning  domain  is  a  tu¬ 
ple  ( S ,  Acts,  Acte ,  — »)  where  S  is  a  finite  set  of  states,  Acts  is  a  finite  set  of  system  actions, 
,  Acte  A  a  finite  set  of  environment  actions,  and  -)CSx  Act  s  x  Acte  x  S  is  a  deterministic 
transition  relation  of  joint  system  and  environment  actions.  Instead  of  ( s ,  as,  ae,  s')  €  — >, 

.  (Is  )Cle  / 

we  write  s  — >  s . 

Adversarial  planning  domains  can  be  described  in  NADL+.  The  set  of  applicable  actions 
of  a  state  s  are  defined  by  the  functions 


App(s) 

—  {{®S; 

ae)  :  d s  .  s  — >  s  ) 

(7.1) 

App  s(s) 

=  {as  : 

3ae  .  (as,ae)  €  App(s)} 

(7.2) 

APPe(s) 

=  {ae  : 

3as .  (as,ae)  €  App(s)} 

(7.3) 

where  App(s),  Apps(s),  and  App6(s)  give  the  set  of  joint-actions,  system  actions,  and 
environments  actions  applicable  in  s,  respectively.  It  is  required  that  system  and  environ¬ 
ment  actions  are  independent  at  each  state.  Otherwise  the  system  can  indirectly  control  the 
environment  by  making  some  of  its  action  unapplicable  and  vice  versa.  Thus 

App(s)  =  Apps(s)  x  App6(s).  (7.4) 

The  set  of  states  that  can  be  reached  from  s  by  some  joint  action  from  s  involving  the 
system  action  as  is  given  by 

NEXTs(s,as)  =  {s'  :  3 ae.s°:^s'}.  (7.5) 

An  adversarial  planning  problem  is  defined  by  an  initial  state  and  a  set  of  goal  states  of 
the  system. 

'This  is  a  standard  assumption  in  e.g.  matrix  games  and  extensive  form  games  [127]. 
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Definition  7.2  (Adversarial  Planning  Problem)  An  adversarial  planning  problem  is  a  tu¬ 
ple  ( V ,  so,  G)  where  V  is  an  adversarial  planning  domain,  So  €  S  is  an  initial  states,  and 
G  C  S  is  a  set  of  goal  states. 

An  adversarial  plan  is  a  plan  for  the  system  represented  in  the  usual  way  as  a  set  of  state- 
action  pairs. 

Definition  7.3  (System  State- Action  Pair  (SSA))  Let  V  be  an  adversarial  planning  do¬ 
main.  A  system  state-action  pair  (s,  as)  ofV  is  a  state  s  ofV  associated  with  an  applicable 
system  action  as  €  Apps(s) 

Definition  7.4  (System  Plan)  Let  V  be  an  adversarial  planning  domain.  A  system  plan  irs 
for  V  is  set  of  SSAs  ofV. 

We  will  use  tts  to  denote  system  plans  and  often  refer  to  them  as  adversarial  plans. 

Example  7.1  For  the  the  adversarial  planning  problem  shown  in  Figure  7.1(a),  we  have 

5  =  {I,F,D,U,G}, 
so  =  I, 

G  =  {G}, 

Acts  =  {+s,  -s}, 

Acte  =  {+e,  -e}, 

— )•  =  {{/,  +s,  —  e,  F),  (I,  —  s,  —  e,  U),  (F,  —  s,  —  e,  F),  (F,  +s,  +e,  F), 

(F,  -Fs,  — e,  G),  (F,  —  s,  +e,  G),  (U,  —s,  —e,  D),  ( U ,  +s,  +e,  U ), 

(U,  +s,  — e,  G),  (U,  —  s,  +e,  G)j. 

Notice  that  this  transition  relation  fulfills  the  requirement  App(s)  =  Apps(s)  x  Appe(s)  for 
any  state  s.  The  state  D  is  a  dead  end,  since  the  goal  is  unreachable  from  D.  This  introduces 
an  important  difference  between  F  and  U  that  captures  a  main  aspect  of  the  adversarial 
planning  problem.  We  can  view  the  two  states  F  and  U  as  states  in  which  the  system  and 
environment  have  different  opportunities.  Observe  that  the  system  “wins”,  i.e.,  reaches 
the  goal,  only  if  the  sign  of  the  two  actions  in  the  joint  action  are  different.  Otherwise  it 
“loses”  since  there  is  no  transition  to  the  goal  with  a  joint  action  where  the  actions  have 
the  same  sign.  The  goal  is  reachable  from  both  F  and  U.  However,  the  consequences  of 
losing  is  different  for  F  and  U.  In  F,  losing  causes  a  transition  back  to  F.  Thus,  the  goal 

is  still  reachable.  In  U,  however,  losing  may  cause  a  transition  to  the  dead  end  D  which 

makes  it  impossible  to  reach  the  goal  in  subsequent  steps.  Consider  how  an  adversarial  and 
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(a)  (b) 

Figure  7.1:  (a)  An  adversarial  planning  problem  with  five  states  {I,  F,D,U,  &'},  an 
initial  state  /,  and  a  single  goal  state  G.  The  system  and  environment  have  actions 
{+s,  —  .s}  and  {+e,  —  e},  respectively,  (b)  The  induced  non-deterministic  planning 
problem  of  the  adversarial  planning  problem  shown  in  (a)  where  the  information  about 
environment  actions  has  been  abstracted. 


informed  environment  can  take  advantage  of  the  possibility  of  reaching  a  dead  end  from  U. 
Since  this  may  happen  if  the  system  applies  —s  in  U,  it  is  reasonable  for  the  environment  to 
assume  that  the  system  will  always  execute  +s  in  U.  But  now  the  environment  can  prevent 
the  system  from  ever  reaching  the  goal  by  always  choosing  action  +e,  so  the  system  should 
completely  avoid  the  state  U.  This  example  domain  is  important  because  it  illustrates  how 
an  adversarial  environment  can  act  purposely  to  obstruct  achievement  of  the  goal.  0 

In  order  to  introduce  an  execution  model,  we  also  need  to  define  environment  plans. 
These  are  sets  of  state-action  pairs,  where  the  action  is  an  environment  action. 

Definition  7.5  (Environment  State- Action  Pair  (ESA))  Let  V  be  an  adversarial  plan¬ 
ning  domain.  An  environment  state-action  pair  ( s,ae )  ofV  is  a  state  s  of  V  associated 
with  an  applicable  environment  action  ae  €  APPe(s) 

Definition  7.6  (Environment  Plan)  Let  V  be  an  adversarial  planning  domain.  An  envi¬ 
ronment  plan  7ie  for  27  is  set  ofESAs  ofV. 

We  will  use  7re  to  denote  environment  plans.  The  set  of  states  covered  by  a  system  plan,  an 
environment  plan,  and  a  combined  plan  is  given  by 

Statess(7ts)  =  {s  :  3as .  (s,as)  e  7TS} 


(7.6) 
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STATESe(7Te)  =  {s  :  3ae  .  (s,  ae)  E  vre}  (7.7) 

S tates (7rs,7re)  =  States(7fs)  n  States (7re).  (7.8) 

The  set  of  actions  of  a  plan  associated  with  a  state  s  is 

ACTs(7Ts,s)  =  {as  :  (s,  as)  E  7rs}  (7.9) 

ACTe(7re,s)  =  {ae  :  (s,  ae)  E  7Te}.  (7. 10) 


The  set  of  possible  end  states  of  a  combined  system  plan  7rs  and  an  environment  plan  ne  is 
given  by 


Closure (7rs,7re)  =  {s'  States (irs,  ire)  :  3s,  as  E  Acts(7ts,  s),  (7.11) 

ae  G  ACTe(7 re,  s) .  s  s'}. 

We  can  now  define  the  execution  model  of  a  system  and  environment  plan. 

Definition  7.7  (Exectution  Model)  An  execution  model  with  respect  to  a  system  plan  tts 
and  an  environment  plan  irefor  the  adversarial  domain  V  =  (S.  Act  s.  Acte,  -a)  is  a  Kripke 
structure  Ai(ns,  ne )  =  (S,  R)  where 

•  S' =  Closure (7rs,7re)  u  States  (7ts,  7re)  u  G, 

•  ( s ,  s')  G  R  iff  s  ^  G ,  3 as,  ae  .  (s,  as)  E  ns,  (s,  ae)  E  ne,  and  s  s',  or  s  =  s' 
and  s  E  CLOSURE  (irs,  7Te)  U  G. 

The  execution  paths  starting  at  s  of  the  system  plan  7 rs  and  environment  plan  7Te  are  given 
by 


Exec(s,  7rs,  7re)  =  {q  :  q  is  a  path  of  A4(7rs,7re)  and  q0  =  s}.  (7.12) 

An  important  question  is  if  adversarial  plans  can  be  generated  via  a  transformation  to  a 
non-deterministic  planning  problem  and  an  application  of  an  existing  non- deterministic 
planning  algorithm  as  were  done  with  fault  tolerant  plans.  One  approach  is  to  let  the 
joint  actions  of  the  system  and  the  environment  form  the  actions  of  a  corresponding  non- 
deterministic  planning  problem.  However,  this  would  imply  that  joint  actions  are  control¬ 
lable  which  is  inconsistent  with  the  assumption  that  environment  actions  are  uncontrol¬ 
lable.  There  does  not  seem  to  exist  a  simple  solution  to  this  problem  except  the  obvious:  to 
model  the  effect  of  environment  actions  as  non-determinism  of  system  actions.  This  trans¬ 
formation  is  defined  as  the  induced  non-deterministic  planning  problem  of  an  adversarial 
planning  problem. 
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Definition  7.8  (Induced  Non-Deterministic  Planning  Problem)  Lei  V  =  (V.  s0,  G)  whe¬ 
re  V  =  (S,  Acts,  Acte,  — >)  A  an  adversarial  planning  problem.  The  non-deterministic 
planning  problem  induced  from  V  is  Vnd  =  {'Dnd ,  s0,  G)  where  Vnd  =  (Snd.  Actnd ,  —fnd) 
and  is  given  by 

•  Snd  =  S 

•  Actnd  =  Acts 

•  s  s'  iff  s  s'  for  some  ae  E  APPe(s). 

Example  7.2  Figure  7.1(b)  shows  the  induced  non-deterministic  planning  problem  of  the 
adversarial  planning  problem  described  in  Example  7.1.  0 

The  least  restricted  environment  plan  is  one  where  each  state  is  associated  with  all  applica¬ 
ble  actions  of  the  environment  Let  nj  denote  the  least  restricted  environment  plan  defined 
by 

7 rj  =  {(s,ae)  :  ae  E  App6(s)}.  (7.13) 

For  an  environment  plan  to  be  non-empty ,  we  require  that  it  associates  at  least  a  single 
action  with  any  state  where  the  set  of  applicable  actions  is  non-empty.  Otherwise,  it  is  trivial 
for  the  environment  to  construct  a  plan  that  for  all  executions  prevent  goal  achievement.  Let 
11+  denote  the  set  of  non-empty  environment  plans 

n+  =  {ne  :  Vs.  ACTe(ne,s)  D  APPe(s)  ±  0}.  (7.14) 

A  strong  plan  for  the  induced  non-deterministic  planning  problem  is  an  important  class 
of  adversarial  plans.  The  fact  that  a  strong  solution  exists  means  that  the  system  is  able  to 
achieve  its  goal  for  any  non-empty  environment  plan.  If  we  regard  the  domain  as  a  game 
between  the  system  and  environment,  such  plans  are  often  referred  to  as  winning  strategies 
(e.g.,[3,  43]).  Strong  cyclic  solutions  to  an  induced  non-deterministic  planning  problem, 
on  the  other  hand,  have  limited  value  as  shown  in  Example  7.3. 

Example  7.3  There  is  no  strong  solution  to  the  induced  non-deterministic  planning  prob¬ 
lem  shown  in  Figure  7.1.  The  reason  is  that  there  does  not  exist  a  system  action  for  F  or  U 
that  guarantees  a  transition  to  G.  A  valid  strong  cyclic  solution  is 

7TS  =  {(I,  +s),  (I,  - s ),  (F,  +s),  (F,  - s ),  (U,  +s)}. 


This  plan  eventually  reaches  the  goal  if  the  environment  is  “friendly”  and  sometimes  exe¬ 
cutes  action  — e  in  state  U.  Such  friendliness,  however,  is  unlikely  if  the  environment  is  an 
opponent.  0 
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The  problem  with  strong  cyclic  solutions  is  that  they  assume  the  environment  uses  the 
least  restricted  plan  7T+.  This  is  also  the  case  for  weak  plans. 

Theorem  7.1  Given  an  adversarial  planning  problem  V  =  {V,  s  o,  G),  a  non-deterministic 
planning  problem  Vnd  =  ('D"d .  sf1.  Gnd)  induced  from  V,  and  a  plan  tts  for  V"d 

•  if  Ti s  is  a  weak  solution  then  AA.{rts,  7rJ),  s0  f=  EF  G, 

•  if 'Kg  is  a  strong  cyclic  solution  then  A4(ns,  nj),  So  =  AGEF  G, 

•  if 'Kg  is  a  strong  solution  then  We  G  11+  .  3W(  Tts,  7re),  Sq  |=  AF  G. 

Proof.  Follows  directly  from  the  definition  of  weak,  strong  cyclic,  and  strong  plans  and  the 
definition  of  the  induced  non-deterministic  planning  problem.  □ 

As  shown  in  Example  7.4,  it  turns  out  that  there  exists  plans  that  are  more  powerful 
than  weak  and  strong  cyclic  plans  for  adversarial  planning  problems. 

Example  7.4  The  strong  cyclic  plan 

Tts  —  {{I,  +s),  (I,  —s),  (F,  +s),  (F,  —s),  ( U ,  +s)} 

described  in  Example  7.3  can  be  improved  by  avoiding  the  state  U.  This  is  done  by  the 
following  plan 

Tts  =  {{-1,  +«},  ( F ,  +s),  (F,  —  s}} 

which  is  guaranteed  eventually  to  reach  the  goal  for  any  non-empty  strategy  of  the  envi¬ 
ronment  given  that  the  system  selects  randomly  between  the  actions  in  the  plan.  Or  more 
precisely,  there  is  a  zero  probability  for  any  infinite  plan  not  reaching  a  goal  state.  0 

Adversarial  planning  introduces  a  class  of  adversarial  weak  and  strong  cyclic  solutions  that, 
similarly  to  strong  solutions,  are  robust  to  any  plan  applied  by  the  environment. 

Definition  7.9  (Weak  and  Strong  Cyclic  Adversarial  Plans)  Given  an  adversarial  plan¬ 
ning  problem  V  =  (T>,  So,  G)  and  a  plan  it s  for  V 

•  Tts  is  a  weak  adversaricd  solution  iffdtte  G  11+  .  AA.{t ts,  Tte),  «o  —  EF  G, 

•  Tts  is  a  strong-cyclic  adversaricd  solution  iff^Tte  G  11+  .  Alfi rs,  7Te),  s  0  =  AGEF  G. 

This  can  be  done  by  generalizing  the  approach  in  Example  7.4  and  prune  states  from  weak 
and  strong  cyclic  plans  where  the  game  between  the  system  and  the  environment  is  unfair. 
We  formalize  this  idea  in  the  definition  of  a  fair  state.  A  state  s  is  fair  with  respect  to  a 
set  of  states  C  and  a  plan  Tts  if  s  is  not  already  a  member  of  C  and  for  each  applicable 
environment  action  there  exists  a  counter  action  in  Tts  such  that  the  joint  action  leads  into 
C. 
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Definition  7.10  (Fair  State)  A  state  s  (j  C  is  fair  with  respect  to  a  set  of  states  C  and  a 
plan  ns  iff  Vae  G  APPe(s) .  3as  €  ACTs(7rs,  s),  s'  G  C  .s  — %  s'. 

For  convenience,  we  define  an  unfair  state  to  be  a  state  that  is  not  fair. 


7.2  Adversarial  Planning  Algorithms 

Weak  and  strong  cyclic  adversarial  plans  can  be  synthesized  by  modifying  the  ordinary 
weak  and  strong  cyclic  precomponents  and  employing  the  generic  non-deterministic  plan¬ 
ning  algorithm  shown  in  Figure  3. 8. 2 

The  core  computations  of  the  precomponent  functions  are  to  find  fair  states  of  a  plan 
7rs  with  respect  to  a  set  of  states  C  and  compute  the  preimage  of  system  state-action  pairs 
(SSAs)  of  a  set  of  states  C. 


FairStates  (tts,C) 

=  {siC  : 

s'  G  C.s 

Vae  G  APPe(s) .  3 as  G  Act,(tt,;  s), 

UsiQe  f t 

— ►  s> 

(7.15) 

PreImgSSA(C) 

=  {(s,os)  : 

Nexts(s,  as)  n  (7^0} 

(7.16) 

7.2.1  Weak  Adversarial  Precomponents 

The  weak  adversarial  precomponent  consists  of  an  ordinary  weak  precomponent  pruned 
for  unfair  states.  The  precomponent  function  is  shown  in  Figure  7.2.  Let  WeakAdver- 

function  PreComp  WA(C) 

1  wSA  g-  PreImgSSA(C)  \  C  x  Acts 

2  waSA  g-  wSA  n  (FairStates  (wSA,C)  x  Acts ) 

3  return  waSA 

Figure  7.2:  The  weak  adversarial  precomponent  function. 


SARIAL  denote  the  NDP  algorithm  using  the  weak  adversarial  precomponent.  Since  the 
pruning  of  unfair  states  makes  the  SSAs  in  the  precomponent  robust  for  any  non-empty 
environment  plan,  it  can  be  shown  that  WeakAdversarial  is  sound,  complete,  and  ter¬ 
minating. 

2 When  applying  this  algorithm  for  adversarial  planning,  the  function  States  is  substituted  with  the  func¬ 
tion  Statess  defined  above. 
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Theorem  7.2  (Correctness  of  WeakAdversarial)  The  WeakAdvers ARIAL  planning  al¬ 
gorithm  is  correct.  The  algorithm  returns  “no  solution  exists”  iff  no  solution  exists,  other¬ 
wise  it  returns  a  valid  solution. 

Proof.  This  follows  from  the  soundness,  completeness,  and  termination  theorems  of  Weak- 
Adversarial  proven  in  Appendix  B.  □ 

A  guided  version  of  WeakAdversarial  can  be  defined  by  using  an  approach  similar 
to  GuidedWeak. 

7.2.2  Strong  Cyclic  Adversarial  Precomponents 

Similarly  to  the  strong  cyclic  precomponent,  the  strong  cyclic  adversarial  precomponent  is 
computed  by  iteratively  expanding  a  candidate  set  and  trying  to  show  that  it  contains  a  valid 
precomponent.  The  precomponent  function  is  shown  in  Figure  7.3.  The  main  difference 
between  the  strong  cyclic  adversarial  precomponent  function  and  the  strong  cyclic  precom¬ 
ponent  function  is  that  the  auxiliary  function  SCAPlanAux  prunes  unfair  states  from  the 
precomponent  instead  of  only  unconnected  states.  This  is  done  by  iteratively  computing 
the  set  of  fair  states  in  the  precomponent  starting  from  the  covered  states  C.  The  com¬ 
putation  also  removes  all  unconnected  states.  Let  StrongCyclicAdversarial  denote 
the  NDP  algorithm  using  the  strong  cyclic  adversarial  precomponent.  It  can  be  shown  that 
StrongCyclicAdversarial  is  sound,  complete,  and  terminating. 

Theorem  7.3  (Correctness  of  StrongCyclicAdversarial)  The  StrongCyclicAdver¬ 
sarial  planning  algorithm  is  correct.  The  algorithm  returns  “no  solution  exists”  iff  no 
solution  exists,  otherwise  it  returns  a  vcdid  solution. 

Proof.  This  follows  from  the  soundness,  completeness,  and  termination  theorems  of 
StrongCyclicAdversarial  proven  in  Appendix  B.  □ 

A  guided  version  of  StrongCyclicAdversarial  can  be  defined  by  using  an  ap¬ 
proach  similar  to  GuidedStrongCyclic. 

Example  7.5  Consider  the  strong  cyclic  adversarial  precomponent  computed  from  the  goal 
state  G  of  the  adversarial  planning  problem  introduced  in  Example  7.1.  The  first  candidate 
precomponent  is  shown  in  Figure  7.4(a).  Action  —  s  would  have  to  be  pruned  from  U  since 
it  has  an  outgoing  transition.  The  pruned  candidate  is  shown  in  Figure  7.4(b).  Now  there 
is  no  action  leading  to  G  in  U  when  the  environment  chooses  +e.  U  has  become  unfair 
and  must  be  pruned  from  the  candidate.  The  resulting  candidate  is  shown  in  Figure  7.4(c). 
Since  the  remaining  candidate  is  non-empty  and  no  further  state-action  pairs  need  to  be 
pruned,  a  non-empty  strong  cyclic  adversarial  precomponent  has  been  found.  0 
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function  PreCompSCA(C) 

1  wSA  •<—  0 

2  repeat 

3  OldwSA  i —  w,\S74 

4  w5A  PreImgSSA(C  U  Statess(w;5A))  \  C  x  Ac/S 

5  5CA  «-  SCAPlanAux  (w54,  C) 

6  until  SC  A  /  0  V  w;5A  =  OldwSA 

7  return  5(74 

function  SCAPlanAux  (starts A,  C ) 

1  54  4-  startSA 

2  repeat 

3  0/75A  a-  54 

4  54  4-  PruneOutgoing (54,  (7) 

5  54  g-  PruneUnfair(54,  C ) 

6  until  54  =  C/754 

7  return  54 

function  PruneOutgoing (54,  (7) 

1  NewSA  54  \  PreImgSSA((7  U  Statess(54)) 

2  return  NewSA 

function  PruneUnfair(54,  (7) 

1  NewSA  <-  0 

2  repeat 

3  0/754  •<—  NewSA 

4  NewSA  f-54  n  FairS tates (54,  C  U  S TATES s(NewSA))  x  Ac/S 

5  until  NewSA  =  0/754 

6  return  NewSA 


Figure  7.3:  The  strong  cyclic  adversarial  precomponent  function. 


7.3  Action  Selection  Strategies 

A  strong  cyclic  adversarial  plan  guarantees  that  no  intelligent  environment  can  choose  a 
plan  that  forces  executions  to  cycle  forever  without  ever  reaching  a  goal  state.  In  principle, 
though,  infinite  paths  never  reaching  a  goal  state  can  still  be  produced  by  a  system  that 
“keeps  losing”  to  the  environment.  However,  by  assuming  the  system  selects  randomly 
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(a) 


(b) 


(c) 


Figure  7.4:  (a)  The  first  candidate  of  PreCompSCA(G),  for  the  problem  shown  in 
Figure  7.1(a).  (b)  The  candidate  pruned  for  actions  with  outgoing  transitions,  (c)  The 
remaining  candidate  pruned  for  unfair  states.  Since  no  further  SSAs  are  pruned,  this 
is  the  strong  cyclic  adversarial  precomponent  returned  by  PreCompSCA(G). 


between  actions  in  its  plan,  we  can  show  that  the  probability  of  producing  such  execution 
paths  is  zero. 

Theorem  7.4  (Termination  of  Strong  Cyclic  Adversarial)  By  choosing  actions  randomly 
from  a  strong  cyclic  adversarial  plan  7rs  produced  by  STRONGCYCLICADVERSARIAL 
given  the  adversarial  planning  problem  V  =  ('. D ,  s0, 

G),  any  execution  path  will  eventually  reach  a  gocd  state.3 

Proof.  Since  all  unfair  states  and  actions  with  transitions  leading  out  of  the  states  covered  by 
7rs  have  been  removed,  all  the  visited  states  of  an  execution  path  will  be  fair  and  covered  by 
the  plan.  Assume  without  loss  of  generality  that  n  strong  cyclic  adversarial  precomponents 
were  computed  in  order  to  generate  tts.  Due  to  the  definition  of  precomponent  functions, 
we  can  then  partition  the  set  of  states  covered  by  tts  into  n  +  1  ordered  subsets  Cn.  ■■■ .  C0 
where  s0  €  Cn,  C0  =  G.  and  Ct  for  0  <  i  <  n  contains  the  states  covered  by  precomponent 
i.  Consider  an  arbitrary  subset  Ct .  Assume  that  there  were  m  iterations  of  the  repeat  loop  in 
the  last  call  to  PruneUnfair  when  computing  precomponent  i.  We  can  then  subpartition 
Ci  into  m  ordered  subsets  Cl:m.  •  •  • ,  C%,\  where  C%,3  contains  the  states  of  the  SSAs  added 
to  NewSA  in  iteration  j  of  PruneUnfair.  Due  to  the  definition  of  FairStates,  we  have 
that  the  states  in  C%.3  are  fair  with  respect  to  7 rs  and  the  states  C  given  by 

j- 1  i- 1 

c  =  U  Cy  u  (J  Q. 

k=l  k= 0 

3It  is  likely  that  the  theorem  holds  for  any  strong  cyclic  adversarial  plan  satisfying  Definition  7.9.  How¬ 
ever,  the  proof  must  be  strengthened  to  show  this. 
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By  flattening  the  hierarchical  ordering  of  the  partitions  Cn .  ■  ■  ■ ,  C'o  and  their  subpartitions, 
we  can  assume  without  loss  of  generality  that  we  get  the  ordered  partitioning  LT ,  •  •  • ,  L0 
where  L0  =  Cq.  Given  that  actions  are  selected  uniformly  in  7 rs,  the  fairness  between  the 
states  in  the  levels  guarantees  that  there  is  a  non-zero  probability  to  transition  to  a  state  in 
Lj_!,  •  ■  ■  ,L0  from  any  state  in  L,, .  Consequently,  an  execution  path  only  reaching  states 
covered  by  tys  will  eventually  reach  a  state  in  L0.  □ 

For  weak  adversarial  plans,  it  is  impossible  to  guarantee  that  a  goal  state  eventually  is 
reached  since  an  execution  path  may  reach  a  dead  end.  On  the  other  hand,  by  selecting 
actions  randomly  from  a  weak  adversarial  plan  there  is  a  non-zero  probability  of  reaching 
a  goal  state. 

Theorem  7.5  (Progress  of  Weak  Adversarial)  By  choosing  actions  randomly  from  a  weak 
adversarial  plan  7rs  poduced  by  WEAKADVERSARIAL  given  the  adversaricd  planning 
problem  V  =  (V,  s0,  G),  there  is  a  non-zero  probability  of  eventually  reaching  the  goal.4. 

Proof  Assume  without  loss  of  generality  that  n  weak  adversarial  precomponents  were 
computed  in  order  to  generate  ns.  Due  to  the  definition  of  precomponent  functions,  we  can 
then  partition  the  set  of  states  covered  by  7 rs  into  an  +  1  ordered  subsets  Cn,  ■  ■  • ,  C0  where 
s0  €  Cn,  C'o  =  G  and  Ct  for  0  <  i  <  n  contains  the  states  covered  by  precomponent  i. 
Consider  an  arbitrary  subset  Ct.  Due  to  the  definition  of  FairStates,  we  have  that  the 
states  in  Ct  are  fair  with  respect  to  7 rs  and  the  states  C  given  by 

i—  1 

c  =  \Jcl. 

k=0 

Thus,  given  that  actions  are  selected  uniformly  in  7rs,  we  have  a  non-zero  probability  to 
transition  to  a  state  in  Cj_i,  •  •  • ,  C'o  from  any  state  in  Ct.  Consequently,  there  is  a  non-zero 
probability  of  an  execution  path  starting  in  so  and  reaching  a  goal  state  in  G.  □ 


7.4  Experimental  Evaluation 

The  performance  of  WeakAdversarial  and  StrongCyclicAdversarial  has  been 
evaluated  in  two  domains.  The  first  of  these  is  a  parameterized  version  of  the  example  do¬ 
main  shown  in  Figure  7.1.  The  second  is  a  grid  world  with  a  hunter  and  prey.  Due  to  time 
limitations  and  the  lack  of  benchmark  problems,  the  guided  versions  of  the  algorithms  have 
not  been  studied.  However,  we  expect  performance  improvements  between  the  blind  and 

4It  is  likely  that  the  theorem  holds  for  any  weak  adversarial  plan  satisfying  Definition  7.9.  However,  the 
proof  must  be  strengthened  to  show  this. 
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guided  versions  of  the  adversarial  algorithms  that  are  similar  to  the  performance  improve¬ 
ments  obtained  for  the  blind  and  guided  version  of  the  weak  and  strong  cyclic  algorithms 
in  Section  5.3. 

All  experiments  are  carried  out  using  the  BIFROST  0.7  search  engine  and  the  exper¬ 
imental  setting  described  in  Appendix  A.  The  problems  of  both  domains  have  been  de¬ 
scribed  in  NADL+.  As  usual,  we  use  n  to  denote  the  number  of  BDD-nodes  allocated  to 
represent  the  shared  BDD,  and  c  to  denote  the  number  of  BDD  nodes  allocated  to  represent 
BDDs  in  the  operator  caches  used  to  implement  dynamic  programming.  Total  CPU  time  is 
measured  in  seconds  and  includes  time  spent  on  allocating  memory  for  the  BDD  package 
and  parsing  the  problem  description. 


7.4.1  Parameterized  Example  Domain 


The  parameterized  example  domain  considers  a  system  and  environment  actions  {+s,  —  s, 
1}  and  {+e,  —  e},  respectively.  The  domain  is  shown  in  Figure  7.5.  The  initial  state  is 
s0  =  I  and  the  goal  states  are  G  =  {g\,  g-> } .  Progress  toward  the  goal  states  is  made  if  the 
sign  of  the  two  actions  in  the  joint  action  are  different.  At  any  time,  the  system  can  cause 
a  switch  from  the  lower  to  the  upper  row  of  states  by  executing  l.  In  the  upper  row,  the 
system  can  execute  only  +s.  Thus,  in  these  states  an  adversarial  environment  can  prevent 
further  progress  by  always  executing  +e.  Figure  7.6  shows  the  total  CPU  time  and  the  size 


(+s.+e) 
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( +s,+e) 


O  (+s~e)  o 

•  •  •  • 


II 


1  ] 

(1,  +e) 

(l.+e) 

0~e) 

(l-e) 

(+s.-e) 

(+s.-e) 

8, 


o  (~s.+e)  o  (~s,+e)  o 


(+s,+e) 

(-s,-e) 


(+s,+e) 

(-s.-e) 


(+s.+e ) 
(-s.-e) 


Figure  7.5:  The  generalized  example  domain  shown  in  Figure  7.1(a). 


of  the  produced  plans  of  the  ordinary  weak  algorithm  compared  to  weak  adversarial  algo¬ 
rithm  and  the  ordinary  strong  cyclic  algorithm  compared  to  the  strong  cyclic  adversarial 
algorithm.  The  BDD  variable  ordering  was  identical  in  all  of  these  experiments.  For  each 
experiment,  the  BDD  package  was  initialized  with  n  =  1 M  and  c  =  700/C  The  total  time 
used  for  memory  allocation  was  0.7  seconds.  Due  to  the  structure  of  the  domain,  the  length 
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Weak  Planning 


Strong  Cyclic  Planning 


Figure  7.6:  Results  of  the  parameterized  example  domain. 


of  a  shortest  path  between  the  initial  state  and  one  of  the  goal  states  grows  linearly  with  the 
number  of  states.  Since  the  four  algorithms  must  compute  at  least  one  preimage  for  each 
step  in  a  shortest  length  path  between  the  initial  state  and  one  of  the  goal  states,  their  com¬ 
plexity  is  at  least  exponential  in  the  number  of  Boolean  state  variables.  The  experimental 
results  seem  to  confirm  this.  In  this  domain,  there  only  is  a  small  overhead  of  generating 
adversarial  plans  compared  to  non-adversarial  plans.  The  quality  of  the  produced  plans, 
however,  is  very  different.  For  instance,  the  strong  cyclic  adversarial  plans  consider  exe¬ 
cuting  only  —  s  and  +s,  while  the  strong  cyclic  plans  consider  all  applicable  actions.  The 
strong  cyclic  adversarial  plan  is  guaranteed  to  achieve  the  goal.  In  contrast,  the  probabil¬ 
ity  of  achieving  the  goal  in  the  worst  case  for  the  strong  cyclic  plan  is  less  than  (f)^2-1, 
where  N  is  the  number  of  states  in  the  domain.  Thus,  for  an  adversarial  environment  the 
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probability  of  reaching  the  goal  with  a  strong  cyclic  plan  is  practically  zero,  even  for  small 
instances  of  the  problem. 

7.4.2  Hunter  and  Prey  Domain 

The  hunter  and  prey  domain  consists  of  a  hunter  and  prey  agent  moving  on  a  chess  board. 
Initially,  the  hunter  is  at  the  lower  left  position  of  the  board  and  the  prey  is  at  the  upper 
right.  The  initial  state  of  the  game  is  shown  in  Figure  7.7.  The  task  of  the  hunter  is  to  catch 


the  prey.  This  happens  if  the  hunter  and  prey  at  some  point  are  at  the  same  position.  The 
hunter  and  prey  move  simultaneously.  They  are  not  aware  of  each  others  moves  before  both 
moves  are  carried  out.  In  each  step,  they  can  either  stay  at  the  spot  or  move  like  a  king  in 
chess.  However,  if  the  prey  reaches  the  lower  left  corner  position,  it  may  change  the  moves 
of  the  hunter  to  that  of  a  bishop  (making  single  step  moves).  This  has  a  dramatic  impact 
on  the  game,  since  the  hunter  then  can  move  only  on  positions  with  the  same  color.  Thus, 
to  avoid  the  hunter,  the  prey  just  have  to  stay  at  positions  with  opposite  color.  A  strong 
cyclic  adversarial  plan  therefore  only  exists  if  it  is  possible  for  the  hunter  to  find  a  plan  that 
guarantees  that  the  prey  never  gets  to  the  lower  left  comer.  A  strong  cyclic  plan,  on  the 
other  hand,  does  not  differentiate  between  whether  the  hunter  moves  like  a  chess  King  or  a 
Bishop.  In  both  cases,  a  “friendly”  prey  can  be  caught. 

We  consider  a  parameterized  version  of  the  domain  with  the  size  of  the  chess  board 
ranging  from  8  x  8  to  512  x  512.  For  the  8x8  board,  we  need  3  Boolean  variables  to 
represent  the  vertical  and  horizontal  location.  This  gives  4  *  3  =  12  Boolean  variables. 
Similarly,  for  the  512  x  512  board,  we  need  4  *  9  =  36  Boolean  variables.  Figure  7.8 
shows  the  total  CPU  time  and  the  size  of  the  plans  produced  by  the  ordinary  weak  algorithm 
compared  to  weak  adversarial  algorithm  and  the  ordinary  strong  cyclic  algorithm  compared 
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to  the  strong  cyclic  adversarial  algorithm.  For  each  experiment,  the  BDD  package  was 
initialized  with  n  =  12 M  and  c  =  500 A’.  The  total  time  used  for  memory  allocation  was 
2.8  seconds.  In  this  domain  both  weak  and  strong  cyclic  adversarial  plans  are  larger  and 

Weak  Planning 


Number  of  Boolean  State  Variables  Number  of  Boolean  State  Variables 


Strong  Cyclic  Planning 


10  15  20  25  30  35  40  10  15  20  25  30  35  40 

Number  of  Boolean  State  Variables  Number  of  Boolean  State  Variables 


Figure  7.8:  Results  of  the  hunter  and  prey  domain. 

take  substantially  longer  time  to  generate  than  ordinary  plans.  The  strong  cyclic  adversarial 
algorithm  spends  more  than  4000  seconds  for  problems  with  28  Boolean  state  variables  or 
more.  However,  as  discussed  above,  it  is  non-trivial  to  determine  whether  there  exists  a 
strategy  of  the  hunter  that  guarantees  that  the  prey  never  succeeds  in  reaching  the  lower 
left  corner.  Thus,  we  may  expect  these  plans  to  be  computationally  harder  than  strong 
cyclic  plans.  This  interpretation  is  supported  by  the  size  of  the  plans.  The  adversarial  plans 
are  substantially  larger  than  the  ordinary  plans.  We  have  not  analysed  the  strong  cyclic 
adversarial  plans  in  detail,  but  they  at  least  must  fulfill  that  the  prey  never  gets  closer  to  the 
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lower  left  comer  than  the  hunter.  The  hunter  starts  at  the  lower  left  comer  and  can  therefore 
ensure  that  the  prey  must  “risk  its  life”  even  by  getting  to  a  location  with  the  same  distance 
to  the  lower  left  corner  as  the  hunter.  Since  the  hunter  still  has  control  over  the  game  in  this 
situation,  it  has  a  positive  chance  of  winning.  Thus,  a  strong  cyclic  adversarial  plan  exists. 


7.5  Conclusion 

The  two  major  design  goals  of  the  weak  and  strong  cyclic  adversarial  planning  algorithms 
is  that  they  are  correct  and  efficient.  With  respect  to  correctness,  the  strong  cyclic  adver¬ 
sarial  algorithm  has  been  chosen  to  closely  match  the  strong  cyclic  algorithm  such  that  a 
similar  proof  strategy  can  be  applied.  With  respect  to  efficiency,  three  major  choices  have 
been  made.  First  of  all,  we  use  a  BDD-based  implementation.  Second,  as  for  the  strong 
cyclic  algorithm,  we  build  up  a  strong  cyclic  adversarial  plan  incrementally  from  the  goal 
states.  Alternatively,  the  algorithm  could  iteratively  prune  a  largest  possible  plan  as  the 
algorithm  suggested  in  [43].  However,  this  approach  seems  less  efficient.  Finally,  guided 
versions  of  the  algorithms  can  be  defined  using  non-deterministic  state-set  branching.  An 
interesting  direction  for  future  work  is  to  combine  fault  tolerant  and  adversarial  planning 
and  to  consider  an  explicit  set  of  goal  states  of  the  environment  agent  [23]. 


7.6  Summary 

In  this  chapter,  we  have  introduced  a  new  framework  called  adversarial  planning  to  address 
domains  where  non-determinism  is  caused  by  simultaneous  actions  of  a  controllable  system 
and  an  uncontrollable  environment.  We  have  shown  that  the  usual  abstraction  of  environ¬ 
ment  actions  may  lead  to  solutions  where  an  adversarial  environment  may  cause  execution 
paths  never  to  reach  a  goal  state.  This  can  be  avoided  by  pruning  states  from  the  plans  where 
the  local  “game”  between  the  system  and  environment  is  unfair.  We  introduce  adversarial 
versions  of  the  weak  and  strong  cyclic  non-deterministic  planning  algorithms  and  show  that 
they  are  sound,  complete,  and  terminating.  The  experimental  evaluation  shows  that  adver¬ 
sarial  plans  may  be  harder  to  produce  than  ordinary  non-deterministic  plans.  However,  this 
is  to  be  expected  since  they  often  represent  more  complex  control  strategies. 
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Chapter  8 
Related  Work 


The  discussion  of  related  work  is  divided  into  five  sections  corresponding  to  the  five  main 
contributions  of  the  thesis.  Section  8.1  describes  work  related  to  BDD-based  deterministic 
planning  and  state-set  branching  in  Artificial  Intelligence  (AI)  and  formal  verification.  Sec¬ 
tion  8.2  first  discusses  alternative  approaches  to  non-de  termini  Stic  planning  in  AI,  automata 
theory,  game  theory,  and  Discrete  Event  System  (DES)  control  theory  and  then  focuses  on 
work  closely  related  to  non-deterministic  state-set  branching.  Section  8.3  presents  work  re¬ 
lated  to  fault  tolerant  planning  within  DES  control  theory  and  AI,  and  Section  8.4  describes 
work  related  to  adversarial  planning  developed  in  automata  theory,  game  theory,  AI,  and 
formal  verification.  Finally,  Section  8.5  reviews  related  work  on  planning  languages  devel¬ 
oped  in  AI,  formal  verification,  and  DES  control  theory. 


8.1  Deterministic  Planning  and  Heuristic  Search 

An  interesting  fact  is  that  even  the  earliest  planning  systems  were  using  a  symbolic  rep¬ 
resentation  of  the  state  space.  The  most  popular  representation  is  STRIPS  where  a  search 
state  is  a  set  of  facts  that  are  true  in  the  set  of  domain  states,  the  search  state  represents  (a  de¬ 
tailed  description  of  STRIPS  planning  is  given  in  Section  3.1).  Progression  planners  search 
forward  in  the  fact  space.  Since  they  start  from  a  set  of  facts  representing  a  single  initial 
state,  each  search  state  corresponds  to  a  single  state.  Hence,  for  progression  planners,  no 
space  savings  are  obtained  with  the  symbolic  representation  compared  to  an  explicit  repre¬ 
sentation.  Regression  planners,  on  the  other  hand,  search  backward  in  the  fact  space.  Since 
this  search  starts  from  a  single  set  of  facts  representing  a  set  of  goal  states,  each  search 
state  may  represent  several  domain  states.  Thus,  regression  planners  may  benefit  from  the 
symbolic  state  representation. 


141 


142 


CHAPTER  8.  RELATED  WORK 


A  wide  range  of  planning  systems  and  search  techniques  have  been  developed  within 
the  STRIPS  framework.  These  planning  systems  are  often  referred  to  as  classical  plan¬ 
ners  and  roughly  fall  in  three  classes:  state  space  planners  (e.g.  Prodigy  [133]),  plan 
space  planners  (e.g.  SNLP  [116]  and  UCPOP  [163])  and  hierarchical  planners  (e.g.  SIPE 
[171]).  The  probably  most  advanced  state  space  planner  is  Prodigy.  Prodigy  performs 
a  bidirectional  search  in  the  fact  space  guided  by  search  control  rules ,  means-end  analy¬ 
sis  and  sub-goaling  [126].  Plan  space  planners,  on  the  other  hand,  carries  out  a  search  in 
a  space  of  possible  plans  using  the  least  commitment  principle  where  orderings  between 
actions  only  are  introduced  if  the  actions  are  causally  linked  or  interfere.  This  may  lead  to 
better  performance  in  some  domains  [8].  However,  plan  space  planners  commit  to  causal 
links  in  much  the  same  way  that  state  space  planners  commit  to  step  ordering.  In  general, 
they  do  not  outperform  state  space  planners  [165].  Hierarchical  planners  such  as  Sipe  use 
hierarchical  task  networks  (HTNs)  to  apply  abstraction  in  the  search.  First,  a  solution  is 
found  at  an  abstract  level  which  then  is  refined  to  a  concrete  plan. 

The  scalability  of  classical  STRIPS  planning  was  substantially  improved  by  the  intro¬ 
duction  of  GraphPlan  [18]  that  avoids  the  state  space  explosion  problem  by  using  a 
planning  graph  to  guide  the  search.  Graph  planners  [18,  114,  104]  use  a  two  step  approach. 
In  the  first  step,  the  planning  graph  is  generated.  The  planning  graph  consists  of  alternat¬ 
ing  action  and  state  layers  and  keeps  track  of  the  interferences  between  actions  and  states 
resulting  in  a  compact  representation  of  the  reachable  states.  In  the  second  step,  a  plan  is 
extracted  from  the  planning  graph  by  a  backwards  search.  Graph  planners  relax  optimality 
constraints  by  only  finding  parallel  optimal  plans,  i.e.,  plans  with  shortest  length  assuming 
that  actions  can  be  applied  concurrently  in  each  step. 

One  of  the  current  trends  in  automated  planning  is  to  reduce  planning  to  other  problems. 
SAT  planners,  like  SATPlan  [99],  encode  a  planning  problem  as  a  satisfiability  problem 
of  a  Boolean  expression  stating  goal  achievement  within  a  certain  number  of  steps.  Using 
binary  search,  this  approach  can  be  optimal,  but  better  results  have  been  obtained  using 
GraphPlan’s  parallel  relaxation  by  encoding  goal  achievement  of  the  planning  graph  as 
a  SAT  problem  (BlackBox, [100]).  Planning  as  satisfiability,  however,  suffers  from  the 
fact  that  the  number  of  Boolean  variables  grows  linearly  with  the  plan  length.  In  addition, 
the  clauses  of  a  planning  problem  form  long  dependency  chains  (corresponding  to  plans) 
that  are  known  to  be  a  worst  case  structure  for  SAT  checkers  [9]. 

A  few  experiments  have  also  been  carried  out  reducing  planning  to  integer  program¬ 
ming  [19,  101].  This  approach  works  well  if  a  considerable  part  of  the  planning  problem 
involves  numerical  constraints.  Good  performance,  however,  has  not  been  obtained  for 
purely  combinatorial  problems. 

The  first  application  of  BDDs  for  deterministic  planning  was  based  on  reduction  of 
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planning  to  symbolic  model  checking  (deterministic  MBP,  [33]),  where  the  plan  corre¬ 
sponds  to  a  counter  example  of  a  verified  property.  The  approach  has  been  shown  to  be 
competitive  with  GraphPlan  and  SATPlan  in  several  classical  domains.  More  recent 
approaches  are  Mips  1.0,  DOP,  BddPlan  and  PropPlan  [54,  92,  80,  59].  All  of  these 
planners  rely  on  blind  BDD-based  breadth-first  search.  Mips  1.0  and  DOP  uses  a  spe¬ 
cialized  preprocessing  of  domains  to  find  compact  Boolean  state  encodings  [50].  Both 
planners  apply  bidirectional  search  from  the  initial  and  goal  states.  BddPlan  and  Prop- 
Plan  are  more  simple  BDD-based  planners  without  domain  preprocessing  and  have  poor 
performance  compared  to  MIPS  1 .0  and  DOP. 

Blind  BDD-based  search  is  currently  one  of  the  most  efficient  approaches  for  finding 
optimal  plans  in  deterministic  domains  [86].  However,  when  optimality  constraints  are 
relaxed,  the  currently  most  efficient  approach  for  the  benchmark  problems  considered  at 
the  AIPS  planning  competitions  [113,  4,  115]  are  pure  heuristic  planners  like  HSP,  FF,  and 
AltAlt  [20,  176,  77].  However,  it  has  been  shown  that  the  HSPr  derived  heuristics  used 
by  these  planners  have  no  plateaus  in  the  competition  domains  making  simple  hill  climbing 
a  sufficiently  strong  search  approach  to  find  a  solution  [79].  It  seems  implausible  that  such 
strong  heuristics  are  easy  to  define  for  a  larger  set  of  benchmark  problems. 

State-Set  Branching 

As  far  as  we  know,  state-set  branching  is  the  first  general  framework  for  combining  heuris¬ 
tic  search  and  BDD-based  search.  All  previous  work  has  been  restricted  to  particular  al¬ 
gorithms.  BDD-based  heuristic  search  has  been  investigated  independently  in  symbolic 
model  checking  and  AI.  The  pioneering  work  is  in  symbolic  model  checking  where  heuris¬ 
tic  search  has  been  used  to  falsify  design  invariants  by  finding  error  traces.  Yuan  et  al. 
[180]  studies  a  bidirectional  search  algorithm  pruning  frontier  states  according  to  their 
minimum  Hamming  distance  to  error  states.  BDDs  representing  Hamming  distance  equiv¬ 
alence  classes  are  precomputed  and  conjoined  with  BDDs  representing  the  search  frontier 
during  search.  Yang  and  Dill  [179]  also  consider  minimum  Hamming  distance  as  heuristic 
function  in  an  ordinary  pure  heuristic  search  algorithm.  They  develop  a  specialized  BDD 
operation  for  splitting  a  set  of  states  according  to  their  minimum  Hamming  distance  to  a  set 
of  error  states.  The  operation  is  efficient.  Its  complexity  is  linear  with  the  size  of  the  BDD 
representing  the  error  states.  However,  it  is  unclear  how  such  an  operation  can  be  general¬ 
ized  to  other  heuristic  functions.  In  addition,  this  approach  finds  next  states  and  splits  them 
according  to  their  Hamming  distance  to  the  goal  states  in  two  separate  phases  where  the 
first  phase  is  as  complex  as  the  single  expansion  phase  used  by  state-set  branching. 

In  general,  heuristic  BDD-based  search  has  received  little  attention  in  symbolic  model 
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checking.  There  may  be  several  reasons  for  this 

1.  Culture.  Heuristics  and  heuristic  search  has  mainly  been  studied  in  AI, 

2.  Lack  of  Efficient  Heuristics.  Symbolic  model  checking  problems  often  consider  se¬ 
quential  circuits  where  all  state  variables  are  changed  in  each  step.  The  diameter  of 
the  transition  graph  may  be  too  low  for  an  efficient  search  heuristic  to  exist  [144], 

3.  A  Different  Problem.  Planning  problems  are  inherently  different  from  verification 
problems.  In  order  to  verify  a  system  all  reachable  states  must  be  explored.  However, 
in  order  to  solve  a  planning  problem  only  a  single  path  from  the  initial  state  to  the 
goal  state  needs  to  be  found. 

An  important  exception  to  the  second  statement  is  verification  of  asynchronous  systems.  In 
the  SPIN  validater  [81]  and  JAVA  PathFinder  [166],  several  heuristics  have  been  studied  to 
guide  the  search  toward  counter  examples  (e.g.,[52,  69]).  In  addition,  a  number  of  heuristic 
methods  have  been  developed  to  guide  the  exploration  of  a  CTL  formula  in  order  to  reduce 
the  complexity  of  the  model  checking  problem  [17,  82]. 

In  AI,  an  implementation  of  A*  called  BDDA*  was  developed  by  Edelkamp  and  Reffel 
[53].  BDDA*  can  use  any  heuristic  function  and  has  been  applied  to  planning  as  well  as 
model  checking  [144].  Edelkamp  later  describes  a  more  general  implementation  of  BDDA* 
not  assuming  unit-cost  transitions  and  with  cycle  detection  for  monotonic  heuristic  func¬ 
tions  [48].  Both  of  these  versions  of  BDDA*,  however,  are  fairly  direct  implementations 
of  A*  with  BDDs  that  imitates  the  usual  explicit  application  of  the  heuristic  function  via 
complex  symbolic  arithmetic.  Our  experimental  results  show  that  the  successor  state  com¬ 
putation  of  BDDA*  scales  poorly.  For  this  reason  a  major  philosophy  in  the  design  of 
state-set  branching  has  been  to  avoid  arithmetic  operations  at  the  BDD  level.  An  ADD- 
based  implementation  of  A*  called  ADDA*  [74]  has  been  developed  after  the  first  publica¬ 
tion  of  state-set  branching.  ADDs  [6]  generalize  BDDs  to  finite  valued  functions.  ADDA* 
is  similar  to  BDDA*  but  implements  cycle  detection  for  general  heuristic  functions.  The 
ADD  may  handle  arithmetic  computations  more  efficiently  than  the  BDD  [168].  However, 
ADDA*  has  not  successfully  been  shown  to  have  better  performance  than  BDDA*  [74]. 

The  high  performance  of  state-set  branching  is  achieved  by  the  branching  partitioning 
that  combines  an  efficient  partitioned  image  and  preimage  computation  with  a  propaga¬ 
tion  of  search  node  information  from  parent  to  child  states.  The  philosophy  of  state-set 
branching  is  that  the  information  represented  by  BDDs  must  be  semantically  closely  re¬ 
lated  in  order  for  the  BDD  operations  to  work  efficiently.  Hence,  in  contrast  to  BDDA* 
and  ADDA*,  we  separate  the  representation  of  information  used  by  the  search  algorithm 
from  the  representation  of  states  and  transitions  and  only  employ  BDDs  to  encode  the  latter. 
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To  our  knowledge,  this  idea  is  genuinely  new.  We  have  not  been  able  to  find  any  previous 
work  in  either  AI,  control  theory,  automata  theory,  and  formal  verification  that  use  a  tran¬ 
sition  relation  partitioning  for  propagating  any  kind  of  state  information.  There  seems  to 
be  several  circumstances  that  may  explain  why  this  particular  stone  never  has  been  turned 
before.  In  AI,  the  main  reason  seems  to  be  that  the  amount  of  work  involving  BDDs  still  is 
very  limited.  The  only  BDD-based  classical  heuristic  search  algorithms  in  AI  are  BDDA* 
and  ADDA*  and  these  algorithms  seem  to  rely  on  a  quite  different  design  philosophy  where 
as  much  information  as  possible  has  been  pushed  to  the  BDD  level.  In  control  theory  and 
automata  theory,  symbolic  controller  synthesis  been  suggested  but  not  sufficiently  inves¬ 
tigated.  As  far  as  we  know,  there  has  not  been  any  work  on  guided  synthesis  algorithms. 
A  relevant  area  to  expect  previous  work,  is  formal  verification.  There  is  a  large  body  of 
work  on  reducing  the  complexity  of  BDD-based  search.  In  addition,  it  was  within  this 
area  that  the  first  guided  BDD-based  search  algorithms  were  invented.  There  seems  to  be 
two  reasons  why  an  approach  similar  to  state- set  branching  has  not  been  considered.  First, 
even  though  the  state-set  branching  approach  covers  both  asynchronous  and  synchronous 
systems,  it  is  more  obvious  to  consider  for  an  asynchronous  system  where  the  transition 
relation  can  be  efficiently  encoded  by  a  disjunctive  partitioning.  However,  in  formal  veri¬ 
fication,  most  work  on  symbolic  model  checking  considers  synchronous  systems.  Second, 
as  discussed  previously,  heuristic  search  is  often  inefficient  for  synchronous  systems  with 
a  low  transition  graph  diameter.  Thus,  only  a  limited  amount  of  work  has  gone  in  this 
direction. 


8.2  Non-Deterministic  Planning 

Non-deterministic  planning  in  different  disguises  has  been  studied  in  AI,  automata  theory, 
game  theory,  and  DES  control  theory.  The  classical  approach  to  non-deterministic  planning 
in  AI  is  conditional  planning.  Conditional  actions  were  first  studied  in  WarPlan-C  [167]. 
Modem  conditional  planners  includes  (CNLP,  [134],  C-Buridan  [46],  and  SGP  [169]). 
CNLP  is  an  extension  of  the  partial  order  planner  SNLP  It  handles  non-determinism  by 
constructing  a  conditional  plan  that  accounts  for  each  possible  situation  or  contingency 
that  could  arise.  At  execution  time  it  is  determined  which  part  of  the  plan  to  execute  by 
performing  sensing  actions  that  are  included  in  the  plan  to  test  for  the  appropriate  condi¬ 
tions.  The  returned  plan  is  a  finite  tree  where  each  branch  is  a  sensing  action.  C-Buridan 
combines  conditional  and  probabilistic  planning.  A  sensing  action  can  be  inserted  in  the 
plan  to  increase  the  success  probability.  Branches  can  be  rejoined  such  that  the  resulting 
plan  is  more  compactly  represented  as  a  DAG.  SGP  descends  from  GraphPlan.  It  has 
been  shown  to  outperform  any  of  the  previous  planners  obtained  as  extensions  to  classi- 
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cal  planners  [169].  The  most  important  limitation  of  conditional  planning  compared  to 
non-deterministic  planning  as  defined  in  this  thesis  is  that  conditional  plans  are  finite  and 
may  grow  exponentially  with  the  number  of  unknown  facts.  A  performance  comparison 
between  the  BDD-based  non-deterministic  planning  system  MBP  [34]  and  SGP  on  the 
Omelet  problem  from  the  SGP  distribution  shows  that  MBP  scales  much  better  than  SGP 
on  this  problem.  Conditional  plans  can  also  be  generated  by  QbfPlan  [145].  QbfPlan 
is  a  generalization  of  the  SATPlan  approach  to  the  case  of  planning  in  non-deterministic 
domains.  The  user  must  provide  the  number  of  control  points  and  observations  in  the  plan. 
This  can  provide  a  significant  limitation  of  the  search  space.  However,  in  the  Chain  domain 
provided  with  the  QbfPlan  distribution,  MBP  outperforms  it  severely  [34].  These  ex¬ 
perimental  results  indicate  that  BDD-based  non-deterministic  planning  is  one  of  the  most 
efficient  approaches  to  conditional  planning.  In  particular,  BDD-based  non-deterministic 
planning  seems  to  be  least  sensitive  to  the  amount  of  non-determinism  in  the  domain.  On 
the  other  hand,  if  non-determinism  is  sparse,  the  planning  graph  approach  employed  by 
SGP  may  be  more  efficient. 

Universal  planning  [154]  is  the  non-deterministic  planning  approach  closest  related 
to  the  approach  investigated  in  this  thesis.  The  main  difference  is  that  we  model  non¬ 
determinism  explicitly.  Instead,  the  original  idea  in  universal  planning  is  to  cover  every 
domain  state  in  order  to  make  the  plan  robust  to  non-determinism  (e.g.,  caused  by  failures  or 
simultaneous  activity).  A  major  challenge  is  to  represent  universal  plans  compactly.  It  has 
been  shown  that  even  for  a  flexible  circuit  representation  of  universal  plans  in  domains  with 
n  Boolean  state  variables,  the  fraction  of  randomly  chosen  universal  plans  with  polynomial 
size  in  n  decreases  exponentially  with  n  [65].  1  However,  universal  plans  encountered  in 
practice  are  normally  far  from  randomly  distributed.  Often  real-world  planning  problems 
and  their  universal  plan  solutions  are  regularly  structured.  A  primary  objective  is  therefore 
to  develop  efficient  techniques  for  exploiting  such  structure.  It  is  exactly  the  ability  of 
BDDs  to  capture  structure  of  many  Boolean  functions  often  met  in  practice  that  makes 
them  attractive  for  representing  universal  plans. 

The  first  BDD-based  universal  planning  system  was  MBP  [36,  37].  The  approach  used 
by  MBP  was  further  explored  in  UMOP  [93]  which  together  with  MBP  is  the  only  current 
BDD-based  universal  planning  system.  An  alternative  approach  to  universal  planning  is 
SimPlan  [98].  SimPlan  generates  a  plan  from  a  forward  search  that  may  be  guided 
by  an  LTL  control  rule  formula  [5].  It  can  synthesize  plans  for  extended  goals  in  Linear 
Temporal  Logic  (LTL)  [139].  This  includes  strong  plans,  but  not  strong  cyclic  plans,  which 
only  can  be  expressed  in  CTL.  A  head-to-head  comparison  between  MBP  and  SlMPLAN 
in  a  robot  delivery  domain  provided  in  the  SimPlan  distribution  shows  that  SimPlan  is 

1  There  is  nothing  new  in  this  result.  Ginsberg’s  circuit  has  the  same  fate  as  any  other  known  representation 
of  Boolean  functions  [121], 
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very  sensitive  to  non-determinism  and  is  outperformed  by  MBP  even  when  applying  search 
control  rules.  MBP  has  later  been  extended  to  handle  temporally  extended  goals  given  as  a 
CTL  formula  [138]. 

Non-deterministic  planning  as  defined  in  this  thesis  does  not  involve  transition  prob¬ 
abilities.  While  transition  probabilities  can  provide  useful  information  in  some  domains, 
there  are  domains  where  modeling  transition  probabilities  is  hard  in  practice  due  to  the 
lack  of  statistical  data.  In  addition,  we  may  expect  the  computational  complexity  of  prob¬ 
abilistic  planning  to  be  higher  than  non-deterministic  planning  due  to  the  more  expressive 
domain  model.  The  collection  of  probabilistic  planners  include  DRIPS  [70]  and  BURIDAN 
[110].  DRIPS  decomposes  operators  of  an  abstraction  hierarchy  of  operators  in  order  to 
find  plans  with  maximum  expected  utility.  BURIDAN  is  derived  from  SNLP  and  produces 
plans  that  meet  a  threshold  probability.  The  produced  plans  are  finite  and  acyclic  and  are 
therefore  in  general  insufficient  to  guarantee  goal  achievement. 

Until  now,  we  have  only  considered  planning  approaches  where  a  plan  is  produced 
prior  to  execution.  An  alternative  approach  is  to  perform  planning  interleaved  or  in  parallel 
with  execution.  This  can  either  be  done  by  monitoring  the  plan  execution  and  re-plan 
whenever  an  action  fails  or  select  an  action  in  each  step  of  the  execution.  Plan  monitoring 
and  re-planning  have  been  widely  used  in  non-deterministic  robotic  domains  (e.g.,  [62, 
172,  71]).  Action  selection  planners  can  be  based  on  real-time  heuristic  search  algorithms 
like  MIN-MAX  LRTA*  [105,  106].  The  MIN-MAX  LRTA*  search  algorithm  can  generate 
suboptimal  plans  in  non-deterministic  domains  through  a  search  and  execution  iteration. 
The  search  is  based  on  a  heuristic  goal  distance  function  that  must  be  provided  for  a  specific 
problem.  The  ASP  algorithm  [21]  uses  a  similar  approach  based  on  the  HSP  heuristic  [20]. 
In  contrast  to  MIN-MAX  LRTA*,  ASP  does  not  assume  a  non-deterministic  environment, 
but  is  robust  to  non-determinism  caused  by  action  perturbations  (i.e.,  that  another  action 
than  the  planned  action  is  chosen  with  some  probability).  In  general,  planners  interleaving 
planning  and  execution  are  incomplete  because  acting  on  an  partial  plan  can  make  the  goal 
unachievable.  However,  they  are  often  efficient  in  robotics  domains  where  most  actions  are 
reversible. 

Non-deterministic  planning,  as  defined  in  this  thesis,  is  assuming  full  observability  of 
the  states.  Another  extreme  is  to  assume  that  the  states  are  unobservable  and  generate 
conformant  plans  [68,  169].  A  conformant  plan  is  a  sequence  of  actions  that  leads  to 
the  goal  independently  of  non-determinism  in  the  domain.  A  BDD-based  approach  to 
conformant  planning  has  been  studied  in  the  MBP  planning  framework  [35]  as  well  as  an 
approach  to  non-deterministic  planning  in  domains  with  partially  observable  states  [13].  In 
the  latter  work,  heuristics  have  been  applied  to  guide  the  expansion  of  an  AND-OR  graph 
where  nodes  are  BDDs  representing  sets  of  belief  states  [12]. 
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We  now  turn  to  discuss  approaches  to  non-deterministic  planning  developed  outside  of 
the  field  of  automated  planning.  Reinforcement  Learning  (RL)  [161]  can  be  regarded  as 
non-deterministic  planning.  In  RL  the  goal  is  represented  by  a  reward  function  in  a  Markov 
Decision  Process  (MDP)  model  of  the  domain.  A  non-deterministic  plan  solving  the  prob¬ 
lem  is  a  policy  mapping  states  to  actions  that  maximizes  the  expected  reward.  The  policy 
can  either  be  represented  explicitly  in  a  table  or  implicitly  by  a  function  (e.g.,  a  neural  net¬ 
work).  The  major  limitation  of  RL  is  its  ability  to  scale.  If  states  are  represented  explicitly 
only  very  small  problem  instances  can  be  solved.  Function  approximation  methods  may 
be  applied  to  obtain  an  implicit  representation  of  the  domain.  However,  this  may  compro¬ 
mise  the  convergence  of  the  value-iteration  methods  used  to  find  policies  [24].  Symbolic 
approaches  have  been  applied  to  RL.  SPUDD  [76]  uses  the  Algebraic  Decision  Diagram 
(ADD)  [6]  to  represent  value  functions  and  policies.  The  value-iteration  computation  of 
SPUDD  is  implemented  via  ADD  manipulations.  Substantial  performance  gains  may  be 
obtained  with  SPUDD  compared  to  ordinary  RL  methods.  Compared  to  BDD-based  non- 
deterministic  planning,  however,  SPUDD  is  limited  by  the  fact  that  it  must  represent  a 
possibly  fast  growing  set  of  different  values  of  the  value  function. 

The  strong  algorithm  in  different  disguises  has  been  discovered  independently  in  au¬ 
tomata  theory  [3],  automated  planning  [36,  37]  and  game  theory  [43].  In  addition,  sym¬ 
bolic  methods  for  supervisory  controller  synthesis  that  in  principle  can  be  used  to  synthe¬ 
size  weak,  weak  adversarial,  strong  cyclic,  strong  cyclic  adversarial  and  strong  plans  were 
suggested  as  early  as  in  1992  [78].  However,  these  specific  algorithms  have,  as  far  as  we 
know,  not  been  described  in  the  DES  control  theory  literature.  We  are  also  not  aware  of 
any  work  in  DES  control  theory  that  studies  the  efficiency  aspect  of  symbolic  controller 
synthesis. 

Non-Deterministic  State-Set  Branching 

Non-deterministic  state-set  branching  is  to  our  knowledge  the  first  attempt  to  guide  a  BDD- 
based  search  for  a  non-deterministic  plan  as  defined  in  this  thesis.  We  have  not  been  able  to 
find  any  previous  work  of  this  kind  in  automated  planning,  automata  theory,  DES  control 
theory,  and  game  theory.  The  closest  work  is  the  symbolic  LAO*  algorithm  [57]  used  to 
solve  MDPs.  However,  this  algorithm  can  not  be  applied  to  problems  without  transition 
probabilities. 
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8.3  Fault  Tolerant  Planning 

Most  of  the  non-deterministic  planning  approaches  discussed  in  the  previous  section  fo¬ 
cuses  on  domains  where  failure  is  a  key  aspect.  This  is  also  the  case  for  the  large  body  of 
work  in  AI  on  fault  diagnosis  (e.g.,  [102,  73,  155,  45]).  However,  work  explicitly  represent¬ 
ing  and  reasoning  about  success  and  failure  effects  of  actions  is  very  limited.  The  Elmer 
system  [117]  uses  error  transitions  from  abstract  actions  to  detect  and  recover  from  fail¬ 
ures.  In  the  Procedural  Reasoning  System  (PRS)  [62],  the  procedure  descriptions  defines 
the  effect  of  successful  and  unsuccessful  execution  of  a  procedure.  Similarly,  the  Reactive 
Model  Based  Programming  Language  (RMPL)  [174]  and  its  underlying  executor  Titan  can 
handle  faults  at  runtime.  The  approach,  however,  does  not  involve  computing  a  fault  tol¬ 
erant  plan.  The  MRG  [67]  planning  language  explicitly  models  failure  effects.  However, 
this  work  does  not  include  planning  algorithms  for  generating  fault  tolerant  plans.  To  our 
knowledge,  the  n-fault  tolerant  planning  algorithms  introduced  in  this  thesis  are  the  first 
automated  planning  algorithms  for  generating  fault  tolerant  plans  given  a  description  of  the 
domain  that  explicitly  represents  failure  effects  of  actions. 

Similarly  to  AI,  there  has  been  substantial  amount  of  work  on  fault  diagnosis  in  DES 
control  theory.  This  work  has  mainly  focused  on  analysing  event  sequences  in  order  to 
determine  if  a  fault  has  happened,  and  if  so,  which  kind  of  fault  [150,  151,  152,  159]. 
However,  there  has  also  been  a  considerable  amount  of  work  on  fault  models.  These  models 
can  be  characterized  as  either  transition  based  or  state  based.  Most  work  (e.g.,  [31,  32,  38]) 
use  the  transition  based  model  and  regard  faults  as  unexpected  changes  in  a  system  that 
tends  to  degrade  the  overall  system  performance  rather  than  causing  a  total  breakdown. 
The  term  failure  suggests  a  complete  breakdown  of  a  system  component  or  function.  The 
transition  based  model  is  also  used  in  supervisory  control  [142]  where  faults  usually  are 
considered  uncontrollable  events  [7,  32].  Within  this  frame,  an  approach  to  fault  tolerant 
control  has  been  considered  that  is  closely  related  to  n-fault  tolerant  planning.  The  work 
in  [135],  specifies  fault  tolerance  for  mission  critical  systems.  A  masking  fault  tolerant 
system  can  recover  from  any  fault.  A  t-fault  tolerant  system  can  recover  from  up  to  t  faults 
occurring  during  its  life  time.  The  system  is  modeled  by  an  automaton  with  start  states,  but 
no  goal  states.  In  addition,  no  algorithms  or  theory  for  controller  synthesis  are  provided. 

The  state  based  models  usually  divides  the  state  space  into  ranges  of  operation  of  some 
system  (e.g.,  “normal  operation  range”,  “admissible  error  range”,  and  “non-admissible  er¬ 
ror  range”  [103],  or  “good”  and  “bad”  states  [128]).  In  Ozveren’s  work  [128],  Stability  is 
defined  to  be  to  visit  the  good  states  infinitely  often.  Thus,  a  controller  is  stable  if  it  from 
any  reachable  bad  state  can  force  a  trajectory  that  in  a  finite  number  of  steps  reaches  the 
good  states.  Stabilizability  is  defined  to  choosing  state  feedback  such  that  the  closed  loop 
system  is  stable.  A  related  approach  [129]  defines  Lyapunov  stability  of  a  class  of  DES. 


150 


CHAPTER  8.  RELATED  WORK 


Consider  a  set  of  states  Xm  that  are  invariant  in  the  plant.  That  is,  any  execution  starting  in 
any  state  in  Xm  stays  within  Xm.  Xrn  is  stable  in  the  sense  of  Lyapunov  if  for  any  e  >  0  a 
max  distance  5  >  0  (given  by  some  metric)  can  be  found  such  that  any  execution  starting 
at  a  state  within  5  from  Xm  ends  up  in  a  state  less  than  e  from  Xrn . 

We  are  not  aware  of  work  in  any  other  field  than  AI  and  DES  control  theory  that  reason 
explicitly  about  failures  in  order  to  automatically  synthesize  fault  tolerant  plans  or  fault 
tolerant  discrete  controllers. 


8.4  Adversarial  Planning 

Adversarial  planning  is  related  to  work  in  AI  on  negotiation  (e.g.,[30, 181,  108])  and  collab¬ 
oration  (e.g.,[63,  47,  83])  in  multi-agent  systems.  The  focus  in  this  work,  however,  is  more 
on  establishing  frameworks  for  describing  these  problems  than  developing  efficient  algo¬ 
rithms  for  solving  them.  In  particular,  the  only  previous  BDD-based  multi-agent  planning 
system  that  we  are  aware  of  is  UMOP  [93]  which  is  a  predecessor  to  the  work  described 
in  this  thesis.  Another  direction  of  work  in  AI  applies  planning  algorithms  to  search  in 
a  space  of  game  states  (e.g.,[170]  Chess,  and  [157]  Bridge).  However,  in  contrast  to  the 
adversarial  planning  algorithms  introduced  in  this  thesis,  these  approaches  do  not  consider 
complete  solutions  of  the  game.  This  is  also  not  the  case  for  game  tree  algorithms  like 
alpha-beta-MiniMax  [125]. 

Adversarial  planning  is  related  to  game  theory  in  the  sense  that  both  offer  alternative 
approaches  for  generating  policies  for  adversarial  environments.  For  instance,  we  could 
enumerate  all  policies  of  the  environment  and  the  system  creating  an  appropriate  payoff 
matrix,  and  solve  this  as  a  normal  form  matrix  game  [127].  Alternatively,  we  could  apply 
game  tree  algorithms  and  solve  the  planning  problem  as  an  extensive  form  game  [127]. 
However,  both  of  these  approaches  are  intractable  due  to  the  exponential  size  of  the  ma¬ 
trix  and  game  tree  in  the  number  of  state  variables.  The  game-theoretic  framework  that 
is  closest  related  to  adversarial  planning  is  stochastic  games.  Stochastic  games  extend 
Markov  decision  processes  to  multiple  agents.  They  are  usually  solved  using  value  iter¬ 
ation  algorithms  that  require  exponential  space  in  the  number  of  state  variables  (see  e.g. 
[156]).  Function  approximation  techniques  may  be  able  to  reduce  the  space  requirements. 
However,  it  is  still  unclear  how  these  methods  can  be  applied  without  sacrificing  the  con¬ 
vergence  properties  of  the  value  iteration  algorithms.  One  of  the  advantages  of  BDD-based 
adversarial  planning  is  to  avoid  such  explicit  representations. 

It  has  been  noted  in  automata  theory  that  winning  strategies  in  two  player  games  corre¬ 
spond  to  strong  plans  and  that  such  strategies  can  be  computed  symbolically  using  BDDs 
[3].  This  is  independent  of  whether  the  moves  by  the  two  players  are  simultaneous  or  inter- 
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leaved.  The  non-deterministic  model  is  strong  enough  to  represent  both  situations.  Thus, 
this  early  work  in  automata  theory  to  some  extend  subsumes  a  later  work  in  automated 
planning  employing  BDDs  for  two-player  games  with  alternating  moves  [49]. 

Adversarial  planning  has  been  studied  in  formal  verification  in  the  form  of  concurrent 
reachability  games  [43,  2,  97].  A  strategy  of  a  player  is  a  mapping  from  states  to  a  proba¬ 
bility  distribution  over  a  set  of  actions  to  apply  in  the  state.  A  state  s  is  sure  if  player  1  (the 
system)  has  a  strategy  so  that  for  all  strategies  of  player  2  (the  environment),  the  game,  if 
started  in  s,  always  reaches  a  set  of  target  states  (goal  states).  Hence,  a  state  is  sure,  if  player 
1  has  a  strong  plan  for  reaching  the  target  states.  A  state  s  is  almost  sure  if  player  1  has  a 
strategy  so  that  for  all  strategies  of  player  2,  the  game,  if  started  in  s,  reaches  a  target  state 
with  probability  1.  Thus,  a  state  is  almost  sure,  if  player  1  has  a  strong  cyclic  adversarial 
plan  for  reaching  the  target  states.  Finally,  a  state  s  is  positive  if  player  1  has  a  strategy 
so  that  for  all  strategies  of  player  2,  the  game,  if  started  in  s  has  a  positive  probability  of 
reaching  the  target  states.  This  corresponds  to  a  weak  adversarial  plan. 

It  is  observed  that  the  set  of  sure,  almost  sure,  and  positive  states  can  be  computed 
symbolically  without  representing  probabilities.  An  algorithm  similar  to  STRONGis  given 
to  compute  sure  states  [43].  The  algorithm  for  computing  almost  sure  states  is  dual  to 
StrongCyclicAdversarial  in  the  sense  that  it  starts  from  all  the  states  in  the  domain 
and  the  most  general  strategy  of  player  1  and  then  iteratively  prunes  states  and  actions  from 
the  strategy  that  can  lead  to  states  where  player  2  can  confine  the  game.  Instead,  Strong¬ 
CyclicAdversarial  iteratively  increments  the  set  of  almost  sure  states.  The  work  in 
[43,  2,  97]  is  theoretical.  There  is  no  experimental  evaluation  of  the  approach.  The  pri¬ 
mary  goal  of  our  work  on  adversarial  planning  is  scalability.  The  incremental  approach  of 
StrongCyclicAdversarial  has  been  chosen  because  we  believe  this  approach  is  more 
efficient  even  in  its  blind  version.  More  importantly,  however,  this  format  of  the  algorithm 
makes  it  possible  to  apply  search  heuristics  using  non-deterministic  state-set  branching. 


8.5  Planning  Languages 

Classical  deterministic  planning  languages  like  STRIPS  [58],  ADL  [132],  and  PDDL  [118] 
represent  domains  in  first  order  logic.  Such  representations  can  be  encoded  compactly  with 
BDDs  as  described  in  Section  3.1.1,  but  it  is  more  natural  to  use  state  variable  representa¬ 
tions  as  in  NADL+.  Non-deterministic  planning  languages  related  to  NADL+  includes  AIZ 
[66]  and  NuPDDL  [137]  that  both  are  used  as  input  languages  to  MBP  [33].  The  action 
description  language  AH  can  represent  propositional  and  non-propositional  fluents  with 
finite  domains.  Actions  may  change  the  value  of  fluents  non-deterministically.  Compared 
to  AH,  NADL+  introduces  numerical  state  variables,  an  explicit  environment  model,  and 
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an  explicit  representation  of  failure  effects  of  actions.  In  addition,  it  includes  features  for 
defining  transition  costs  and  for  propagating  search  information  between  states.  The  only 
prior  planning  language,  we  are  aware  of,  that  explicitly  models  action  failure  is  MRG  [67]. 
However,  this  language  does  not  explicitly  model  the  actions  of  an  uncontrollable  environ¬ 
ment.  NuPDDL  descends  from  PDDL  2.1  [60]  that  can  represent  numeric-valued  fluents 
and  time.  In  addition,  nuPDDL  can  model  uncertainty  in  initial  states  and  non-deterministic 
action  effects.  However,  it  has  no  constructs  for  explicitly  describing  the  actions  of  an  un¬ 
controllable  environment. 

Designs  in  formal  verification  are  often  described  as  a  collection  of  concurrent  non- 
deterministic  modules.  For  instance,  the  input  language  to  the  model  checker  SMV  [119] 
defines  each  module  as  a  set  of  state  variables  and  an  expression  stating  the  possible  as¬ 
signments  to  the  variables.  Modules  in  digital  circuit  description  languages  such  as  VHDL 
and  Verilog  [16,  15]  describe  a  unit  in  the  circuit  at  some  level  of  detail  by  defining  the 
computations  mapping  signals  from  input  to  output  wires.  Similarly  to  NADL+,  designs  in 
these  languages  typically  describe  a  closed  system  where  both  the  behavior  of  the  system 
and  its  environment  are  defined.  However,  it  is  not  obvious  how  to  use  these  languages  as 
planning  languages  since  a  design  is  assumed  to  describe  a  controlled  system. 

In  DES  control  theory,  systems  are  often  described  visually  using  Petri  nets  [136]  or 
object  oriented  description  languages  such  as  the  Unified  Modeling  Language  (UML)  [22]. 
These  representations,  however,  often  grow  fast  with  the  size  of  the  system  and  are,  like 
design  representations  in  formal  verification,  often  describing  a  controlled  system. 


8.6  Summary 

In  this  chapter,  we  have  discussed  work  related  to  the  thesis.  The  investigation  of  related 
work  is  based  on  previous  work  in  AI,  DES  control  theory,  formal  verification,  game  theory, 
and  automata  theory.  The  main  conclusions  are 

1.  Using  BDDs  for  non-deterministic  search  and  for  representing  non-deterministic 
plans  seems  to  be  the  currently  most  efficient  approach  to  non-deterministic  plan¬ 
ning  for  domains  with  dense  non-determinism, 

2.  State-set  branching  appears  to  be  the  currently  most  general  and  most  computation¬ 
ally  efficient  framework  for  combining  classical  heuristic  search  and  BDD-based 
search, 

3.  Non-deterministic  state-set  branching  is,  as  far  as  we  know,  the  first  framework  for 
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guiding  BDD-based  search  algorithms  that  generate  non-deterministic  plans  as  de¬ 
fined  in  this  thesis, 

4.  The  fault  tolerant  planning  algorithms  introduced  in  the  thesis  are  to  our  knowledge 
the  first  algorithms  to  synthesize  n-fault  tolerant  control  strategies  given  a  domain 
description  that  explicitly  represents  successful  and  failure  effects  of  actions. 

5.  Adversarial  planning  is,  as  far  as  we  know,  the  first  work  that  studies  fully  imple¬ 
mented  and  complete  symbolic  algorithms  for  synthesizing  strategies  for  winning 
concurrent  reachability  games  with  probability  1  or  positive  probability.  To  our 
knowledge,  it  also  is  the  first  work  that  provides  such  algorithms  in  a  format  that 
enables  guided  search  techniques  to  be  applied. 

6.  NADL+  is  to  our  knowledge  the  first  representation  language  suitable  for  planning 
that  both  explicitly  represents  uncontrollable  environment  actions  and  failure  effects 
of  actions. 
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Chapter  9 
Conclusion 


In  this  chapter,  we  first  briefly  summarize  the  main  contributions  of  the  thesis  in  Section  9.1. 
Then  in  Section  9.2,  we  consider  non-deterministic  planning  as  a  possible  future  approach 
to  automated  controller  synthesis. 


9.1  Contributions 

The  goal  of  this  thesis  has  been  to  push  the  current  state-of-the-art  of  BDD-based  non- 
deterministic  planning  in  two  independent  directions.  The  first  of  these  is  to  develop  BDD- 
based  non-deterministic  planning  algorithms  with  high  performance.  To  this  end,  we  have 
developed  a  general  framework  called  state-set  branching  that  seamlessly  combines  deter¬ 
ministic  BDD-based  search  and  classical  heuristic  search.  Our  experimental  results  show 
that  the  performance  of  a  state-set  branching  implementation  of  the  A*  algorithm  often 
dominates  both  blind  BDD-based  search  and  the  ordinary  A*  algorithm.  In  addition,  it  con¬ 
sistently  outperforms  the  previous  BDD-based  implementation  of  A*.  We  have  shown  that 
state-set  branching  generalizes  to  non-deterministic  planning  and  have  introduced  heuris- 
tically  guided  algorithms  for  weak,  strong  cyclic,  and  strong  non-deterministic  planning. 
Our  experimental  results  show  that  extensive  performance  gains  can  be  obtained  with  these 
algorithms  compared  to  the  ordinary  blind  BDD-based  search  algorithms,  both  in  terms  of 
computational  efficiency  and  the  size  of  the  produced  plans. 

The  second  direction  of  work  in  the  thesis  is  to  improve  the  current  solution  classes 
in  BDD-based  non-deterministic  planning.  To  this  end,  we  have  introduced  two  new  fr- 
maeworks  called  fault  tolerant  planning  and  adversarial  planning.  Fault  tolerant  planning 
extends  the  non-deterministic  domain  model  with  an  explicit  description  of  the  effect  of 
failing  actions.  In  this  way,  it  is  possible  to  define  a  new  class  of  non-deterministic  plans 
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called  n-fault  tolerant  plans.  Compared  to  strong  cyclic  and  strong  plans,  the  advantage 
of  fault  tolerant  plans  is  that  they  do  not  have  to  take  all  possible  fault  combinations  into 
account.  A- fault  tolerant  plans  guarantee  goal  achievement,  but  only  if  no  more  than  n 
faults  occur  during  execution.  The  fault  tolerant  planning  algorithms  introduced  in  the  the¬ 
sis  are  the  first  to  synthesize  n-fault  tolerant  control  strategies  given  a  domain  description 
that  explicitly  represents  successful  and  failure  effects  of  actions. 

Adversarial  planning  extends  the  non-deterministic  domain  model  with  a  set  of  uncon¬ 
trollable  environment  actions  that  causes  the  outcome  of  the  controllable  actions  to  be  non- 
deterministic.  We  show  that  ordinary  strong  cyclic  plans  may  never  reach  a  goal  state  if  the 
environment  is  an  informed  opponent.  To  address  this  problem,  we  introduce  two  classes 
of  adversarial  plans  called  weak  adversarial  plans  and  strong  cyclic  adversarial  plans.  We 
present  two  BDD-based  algorithms  for  computing  these  plans.  The  algorithms  are  exten¬ 
sions  of  the  previous  weak  and  strong  cyclic  planning  algorithms  and  may  be  defined  in  a 
guided  version  using  non-deterministic  state-set  branching.  To  our  knowledge,  adversarial 
planning  is  the  first  work  that  studies  fully  implemented  and  complete  symbolic  algorithms 
for  synthesizing  strategies  for  winning  concurrent  reachability  games  with  probability  1  or 
positive  probability. 

The  thesis  demonstrates  that  BDD-based  non-deterministic  planning  can  scale  to  sig¬ 
nificant  real-world  domains  such  as  the  Deep  Space  1  domain,  the  SIDMAR  steel  pro¬ 
ducing  plant,  and  the  Power  Supply  Restoration  domain  (PSR).  The  thesis,  however,  does 
not  include  experimental  work  on  executing  these  plans  in  order  to  control  the  physical 
systems  they  model.  There  are  several  issues  that  must  be  considered  when  applying  non- 
deterministic  plans. 

1.  The  plans  do  not  provide  any  probability  distribution  over  the  actions  to  apply  in  a 
state.  For  some  domains  this  may  restrict  the  useability  of  the  plans. 

2.  Except  for  adversarial  plans,  it  is  assumed  that  only  a  single  activity  takes  place  in 
each  time  step.  The  activities  may  represent  events  in  a  discrete  event  system,  but 
it  is  not  natural  to  use  a  planning  language  like  NADL+  to  encode  such  domains. 
In  addition,  the  non-deterministic  model  does  not  provide  a  solution  to  the  timing 
problem  of  these  events. 

3.  In  order  to  select  actions  from  a  plan,  the  current  state  must  be  fully  observable. 
Often  this  is  not  the  case.  The  problem  may  be  addressed  by  abstracting  the  domain 
model  to  the  subset  of  observable  state  variables.  This,  however,  may  restrict  the 
ability  to  produce  practically  useful  plans. 

Despite  these  constraints,  we  believe  that  there  exists  important  applications  where  the  non- 
deterministic  abstraction  or  one  of  the  extensions  described  in  the  thesis  are  strong  enough 
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to  produce  practically  useful  solutions.  In  addition,  we  are  convinced  that  the  simplicity 
of  non-deterministic  abstraction  compared  to  for  example  Markov  decision  processes  or 
timed  automata  may  be  necessary  to  scale  to  the  extremely  large  domains  often  considered 
in  real-world  applications.  For  these  reasons,  we  recommend  a  continued  focus  on  this  line 
of  research  in  the  future. 


9.2  Outlook  and  Future  Directions 

An  interesting  direction  of  future  work  is  to  use  non-deterministic  BDD-based  planning  for 
automated  controller  synthesis.  Non-deterministic  plans  correspond  to  discrete,  memory¬ 
less,  and  untimed  controllers  where  the  task  is  to  force  the  controlled  system  into  a  set  of 
goal  states.  There  is  a  wide  range  of  high  profile  application  domains  for  such  controllers 
including  automated  production,  traffic  control,  robotics,  and  embedded  systems,  just  to 
mention  a  few.  A  surprising  fact  is  that  the  current  efforts  on  developing  efficient  controller 
synthesis  algorithms  are  very  limited.  Automated  planning  has  a  strong  focus  on  devel¬ 
oping  efficient  data  structures  and  algorithms  to  make  planning  systems  scale.  There  has 
been  a  significant  amount  of  work  on  non-deterministic  domains  and  robotics  applications. 
However,  automated  planning  has  not  traditionally  had  close  ties  to  the  application  domains 
mentioned  above.  In  DES  control  theory,  the  situation  is  the  opposite.  There  has  always 
been  a  close  connection  to  industrial  applications,  but  the  efforts  on  developing  efficient 
algorithms  and  data  structures  for  automated  DES  controller  synthesis  have  been  limited. 
The  reason  for  this  may  be  partially  historical  since  the  programming  of  control  switch¬ 
ing  boards  mainly  has  been  considered  a  technician  field  [109].  The  game  tree  algorithms 
[125]  and  real  time  reactive  planning  algorithms  [105]  developed  in  AI  are  probably  some 
of  the  most  scalable  approaches  to  automated  discrete  control  known  today.  A  limitation  of 
these  algorithms,  however,  is  that  they  are  incomplete  and  may  drive  the  system  into  an  un¬ 
recoverable  state  that  could  have  been  avoided  given  a  complete  search  prior  to  execution. 
In  formal  verification  there  has  been  extensive  work  on  automated  verification  of  discrete 
controllers  (e.g.,[ll]).  However,  the  amount  of  work  on  controller  synthesis  is  limited. 

Several  major  questions  need  to  be  answered  in  order  to  mature  BDD-based  non- 
deterministic  planning  for  automated  controller  synthesis.  First  of  all,  a  library  of  industrial 
benchmark  problems  needs  to  be  established  similar  to  the  benchmark  suites  used  in  for¬ 
mal  verification.  This  will  clarify  the  distribution  and  character  of  the  relevant  problems 
and  help  to  guide  a  development  of  specialized  algorithms  for  key  problems.  Second,  an 
appropriate  family  of  specification  languages  and  domain  description  languages  must  be 
developed.  The  current  approach  in  DES  control  theory  is  to  use  Petri  nets  or  complete 
state  transition  graphs  to  represent  domains  and  specifications  and  does  not  seem  to  scale 
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to  large  and  combinatorially  complex  problems.  Finally,  it  needs  to  be  clarified  how  ef¬ 
ficiently  controllers  represented  by  BDDs  can  be  mapped  to  integrated  circuits.  If  circuit 
representations  of  controllers  tend  to  be  large  this  strongly  limits  the  applicability  of  the 
approach. 
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Appendix  A 
BIFROST 


This  appendix  contains  a  description  of  the  Bdd-based  InFoRmed  planning  and  controller 
Synthesis  Tool  (BIFROST).  Section  A.l  is  a  user  guide  to  BIFROST  version  0.7.  Sec¬ 
tion  A. 2  describes  the  syntax  and  semantics  of  NADL+  used  as  input  language  to  BIFROST. 
Finally,  Section  A. 3  describes  the  experimental  setting  used  for  the  experiments  described 
in  the  thesis. 


A.l  User  Guide  to  BIFROST  0.7 

BIFROST  version  0.7  is  a  software  package  for  BDD-based  deterministic  and  non-determi- 
nistic  planning  and  heuristic  search.  The  program  is  written  in  C++/STL  for  the  GNU  GCC 
compiler  running  on  a  Redhat  Linux  7.1  PC.  The  software  is  open  source  and  may  be  used 
for  scientific  and  teaching  purposes.  BIFROST  uses  the  BuDDy  2.0  BDD-package  [1 12]. 1 


A.1.1  Usage 

Follow  the  instructions  on  the  BIFROST  web  site  [87]  to  download  and  install  the  program. 
BIFROST  is  a  regular  UNIX  command 

bifrost  -d  domainFile  [ -iaxyvponcrthuf eg]  . 

The  options  of  BIFROST  are  shown  by  executing  bi  frost  -h.  The  input  to  BIFROST  is 
a  planning  problem  written  either  in  the  STRIPS  part  of  PDDL  [118]  or  NADL+  described 

'Comparison  experiments  with  the  CUDD  package  [158]  has  not  shown  a  significant  performance  differ¬ 
ence  [85]. 
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in  Section  A. 2.  Option  -i  type  defines  the  input  type  and  must  be  set.  The  possible  values 
for  type  are  PDDL  and  NADL.  If  the  input  is  PDDL,  two  input  files  must  be  given.  The  first 
is  the  PDDL  domain  description  which  is  set  by  option  -d  domain  file  name.  The  second 
is  the  PDDL  problem  description  which  is  set  by  option  -p  problem  file  name.  If  the  input 
is  NADL+,  only  a  single  NADL+  input  file  is  given  containing  both  a  domain  description 
and  a  problem  description.  The  name  of  the  NADL+  file  is  set  with  option  -d  domain  file 
name.  The  verbosity  level  is  set  by  option  -v  num  where  man  is  a  non-negative  number. 
The  higher  the  value  of  num ,  the  more  information  BIFROST  dumps  to  the  screen. 

The  memory  parameters  of  the  BuDDy  package  are  adjusted  with  options  -n  num 
and  -c  num.  The  -n  option  sets  the  number  of  BDD-nodes  allocated  to  represent  the 
shared  BDD,  while  the  -c  option  sets  the  number  of  BDD  nodes  allocated  to  represent 
the  BDDs  in  the  operator  caches  used  to  implement  dynamic  programming.  For  medium 
sized  problems  good  values  for  n  and  c  are  around  1M  and  400K,  respectively.  The  Buddy 
package  can  also  be  initialized  to  use  dynamic  variable  reordering  with  option  -r  type.  The 
possible  values  of  type  are  Off  (no  dynamic  variable  reordering)  and  Win2ite  (sliding 
window  reordering).2 

The  search  algorithm  used  by  BIFROST  is  set  by  option  -  a  type.  The  possible  values  of 
type  are  shown  in  Table  A.l.  All  BDD-based  algorithms  implemented  in  BIFROST  rely  on 
a  disjunctive  partitioning  of  the  transition  relation.  The  threshold  for  merging  partitions  is 
set  by  option  -t  num.  The  search  timeout  bound  is  set  by  option  -1  num  where  num  is  the 
timeout  bound  in  seconds.  For  bidirectional,  forward,  and  backward  deterministic  search, 
frontier  set  simplification  based  on  [41]  can  be  activated  by  option  -f .  For  the  weighted 
A*  algorithms,  /  is  given  by  f  =  x*g  +  y*h  where  x  and  y  are  in  the  range  [0;  1]  and  are 
set  by  options  -x  num  and  -y  num.  For  PDDL  problems,  the  heuristic  function  is  given  by 
option  -g  type  where  the  possible  values  of  type  are  MinHamming,  which  is  the  minimum 
Hamming  distance,  and  HSPr  which  is  the  HSPr  heuristic  described  in  [20].  For  any  of  the 
state-set  branching  algorithms,  option  -u  num  sets  the  upper  bound  of  the  size  of  merged 
BDDs  in  the  search  queue. 

BIFROST  can  write  two  different  output  files.  The  first  is  the  solution  file.  Its  name 
is  set  by  option  -o  solution  file  name.  For  deterministic  problems,  it  is  a  text  file  with  a 
solution  given  as  a  sequence  of  actions.  For  non-de  termini  Stic  problems,  it  is  a  BDD  file 
representing  the  produced  non-deterministic  plan.  The  second  possible  output  file  is  for 
conducting  experiments  with  BIFROST.  The  name  of  the  experiment  file  is  set  by  option 
-e  experiment  file  name.  The  experiment  file  is  a  text  file  with  data  about  the  search 
including  time  to  allocate  memory,  analyse  the  domain,  build  the  transition  relation,  and 
search.  In  addition,  it  contains  information  about  the  size  of  the  solution,  the  average  size 
2See  the  BuDDy  2.0  user  manual  for  a  detailed  description. 
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Deterministic  Search  Algorithms 


Bidir 

Forward 

Backward 

ghSetAstar 

f SetAstar 

Astar 


BDD-based  breadth-first  bidirectional  search. 
BDD-based  breadth-first  forward  search. 

BDD-based  breadth-first  backward  search. 

GhSetA*  in  a  weighted  version  (f  =  x  *  g  +  y  *  h). 
fSetA*  in  a  weighted  version  (f  =  x  *  g  +  y  *  h). 
Ordinary  weighted  A*  with  explicit  state  representa¬ 
tion  and  cycle  detection.  The  input  must  be  in  PDDL 
format. 


BDDAstar 

iBDDAstar 


BDDA*. 

Improved  BDDA*. 


N 'on-Deterministic  Search  Algorithms 


Weak 

WeakH 

WeakAdv 

StrongCyclic 

StrongCyclicH 

StrongCyclicAdv 

Strong 
StrongH 
Fault Tolerant 
GuidedFault Toler ant 


Weak. 

GuidedWeak. 

WeakAdversarial.  The  input  must  be  in  NADL+ 
format. 

StrongCyclic. 

GuidedStrongCyclic. 

StrongCyclicAdversarial.  The  input  must  be 
in  NADL+  format. 

Strong. 

GuidedStrong. 

1-FTP.  The  input  must  be  in  NADL+  format. 

1-GFTP.  The  input  must  be  in  NADL+  format. 


Table  A.l:  BIFROST  search  algorithms. 


of  the  BDDs  representing  the  search  frontier,  and  the  number  of  iterations  of  the  algorithm. 
Finally,  it  summarizes  the  parameters  of  the  BDD  package  and  the  name  of  the  input  file. 
If  an  experiment  file  already  exists  with  the  same  name,  BIFROST  appends  its  result  to  the 
file.  Otherwise,  it  creates  the  file  and  adds  the  first  row  of  results. 

A.  1.2  Examples 

bifrost  -i  NADL  -d  51ine.nadl  -a  WeakH  -t  5000  -e  WeakH.dat 
-n  15000000  -c  500000  -v  1  -1  1000 
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Initializes  the  Buddy  package  with  15M  BDD  nodes  and  a  cache  of  500K  BDD  nodes. 
BIFROST  then  builds  a  disjunctive  partitioning  of  the  NADL+  problem  described  in  the 
file  51ine.nadl  with  a  merging  threshold  of  5000  BDD  nodes.  The  GuidedWeak 
non-deterministic  planning  algorithm  is  used  to  find  a  solution.  The  timeout  bound  is  set 
to  1000  seconds  and  the  debug  verbosity  level  is  1.  The  experimental  results  are  written  to 
the  file  WeakH  .  dat. 

bifrost  -i  PDDL  -d  domain. pddl  -a  BDDAstar  -v  1  -e  BDDAstar 
.dat  -p  singleOl .pddl  -t  4000  -n  8000000  -c  400000  -g  HSPr 

Initializes  the  Buddy  package  with  8M  BDD  nodes  and  a  cache  of  400K  BDD  nodes. 
BIFROST  then  builds  a  disjunctive  partitioning  of  the  PDDL  domain  and  problem  de¬ 
scribed  in  the  files  domain. pddl  and  singleOl  .pddl  with  a  merging  threshold  of 
4000  BDD  nodes.  The  BDDA*  algorithm  is  used  to  find  a  solution  using  the  HSPr  heuris¬ 
tic.  The  timeout  bound  is  set  to  the  default  500  seconds  and  the  debug  verbosity  level  is  1. 
The  experimental  results  are  written  to  the  file  BDDAstar  .  dat. 

bifrost  -1  NADL  -d  D4V4M15.nadl  -g  MinHamming  -1  500  -u  200 
-n  8000000  -c  700000  -x  1.0  -y  1.0  -t  5000  -e  ghSetAstar . exp 
-a  ghSetAstar 

Initializes  the  Buddy  package  with  8M  BDD  nodes  and  a  cache  of  700K  BDD  nodes. 
BIFROST  then  builds  a  disjunctive  partitioning  of  the  NADL+  problem  described  in  the  file 
D4V4M1 5  .  nadl  with  a  merging  threshold  of  5000  BDD  nodes.  The  GhSetA*  algorithm 
with  /  =  1.0  *<7+ 1.0* /us  used  to  find  a  solution  using  the  min  Hamming  distance  heuristic. 
BDD  nodes  with  a  size  below  200  are  merged  in  the  search  queue  of  GhSetA*.  The 
timeout  bound  is  set  to  500  seconds  and  the  debug  verbosity  level  is  1.  The  experimental 
results  are  written  to  the  file  ghSetAstar. 


A.2  NADIA 

NADL  was  developed  as  a  part  of  the  UMOP  project  [84,  93,  92,  94].  However,  despite 
providing  a  very  general  framework  for  modeling  non-deterministic  planning  problems, 
NADL  does  not  allow  additional  information  about  transition  costs,  heuristic  estimates, 
and  failure  effects  of  actions.  NADL+  adds  these  features  to  the  language.  There  are  are 
three  main  differences  between  the  two  languages 
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1.  NADL+  has  three  new  optional  action  description  components  dg,  dh,  and  err.  In 
addition,  it  uses  the  entry  heu  to  define  the  value  of  the  heuristic  estimate  in  the  initial 
state  and  the  goal  states, 

2.  An  action  description  may  concist  of  descriptions  of  several  transition  groups, 

3.  NADL+  assumes  that  the  system  and  environment  are  described  by  as  set  of  actions 
instead  of  a  set  of  agents. 

The  action  component  dg:  int  associates  a  transition  cost  or  weight  with  the  action.  The 
component  dh:  int  describes  the  change  of  a  heuristic  estimate  associated  with  each  tran¬ 
sition  represented  by  the  transition  group.  The  change  is  always  given  in  forward  direction 
even  if  the  heuristic  guides  a  backward  search.  Finally,  err:  formula  defines  a  set  of  next 
states  reached  by  the  action  given  that  its  execution  fails. 

An  NADL+  problem  description  consists  of:  a  set  of  state  variables,  a  set  of  system  and 
environment  actions,  and  an  initial  and  goal  condition.  The  set  of  state  variable  assignments 
defines  the  state  space  of  the  domain.  The  set  of  system  actions  must  be  non-empty  while 
the  set  of  environment  actions  may  be  empty  if  no  active  environment  exists.  System  and 
environment  actions  are  assumed  to  be  synchronous.  At  each  step,  exactly  a  single  system 
and  environment  action  is  performed.  The  resulting  action  is  called  a  joint  action.  Only 
the  system  actions  are  controllable.  An  action  has  three  main  parts:  a  set  of  modified  state 
variables,  a  precondition  formula,  and  an  effect  formula.  The  set  of  modified  variables  are 
the  state  variables  which  may  have  their  value  changed  by  the  action.  In  order  for  an  action 
to  be  applicable,  the  precondition  formula  must  be  satisfied  in  the  current  state.  The  effect 
of  the  action  is  defined  by  the  effect  formula.  The  value  of  state  variables  not  modified  by  a 
joint  action  is  unchanged.  The  initial  and  goal  condition  are  formulas  that  must  be  satisfied 
in  the  initial  state  and  the  goal  states,  respectively. 

Example  A.l  An  NADL+  planning  problem  is  shown  in  Figure  A.l.  The  problem  has 
two  state  variables  pos  and  power.  The  position  is  a  natural  number  that  can  be  represented 
by  three  Boolean  variables.  This  gives  pos  the  domain  {0, 1,  2, 3,  4,  5,  6,  7}.  The  power 
is  a  proposition  and  is  represented  by  a  single  Boolean  state  variable.  The  system  is  a 
robot  moving  between  the  eight  positions.  It  has  two  actions  Right  and  Left.  The  cost 
of  both  actions  is  1.  The  heuristic  is  for  guiding  a  backward  search  from  the  goal  states 
to  the  initial  state.  It  therefore  estimates  the  distance  to  the  initial  state.  This  estimate 
is  simply  the  value  of  the  position.  Thus,  a  successful  Left  action  changes  the  heuristic 
estimate  with  —1,  while  a  successful  Right  action  changes  it  with  +1.  The  effect  of  the 
Right  action,  depends  on  the  power  variable.  If  the  power  is  true  then  the  position  is 
increased,  otherwise  nothing  happens.  For  this  reason,  the  transitions  of  the  Right  action 
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are  partitioned  into  two  transition  groups  where  the  first  describes  the  successful  outcome 
of  the  action  where  dh:  is  1,  and  the  second  describes  the  unsuccessful  outcome  of  the 
action  where  dh:  is  0.  The  Left  action  is  assumed  to  succeed  independent  of  the  value 
of  power.  It  can  therefore  be  described  by  a  single  transition  group.  The  environment 
controls  the  power  with  two  actions  On  and  Off.  Since  the  system  and  environment  must 
apply  exactly  one  action  at  each  step,  there  are  four  joint  actions  Left-On ,  Left-Off,  Right- 
On,  and  Right-Off.  Initially,  the  power  is  on  and  the  robot  is  at  position  0.  The  goal  is 
to  reach  position  7.  The  value  of  the  heuristic  estimate  must  be  given  for  the  goal  states 
in  order  to  use  a  branching  partitioning  to  propagate  the  value  of  the  heuristic  estimate  to 
other  states.  This  is  done  by  adding  the  entry  heu:  7  to  the  goal  condition.  0 


Syntax  of  NADL+ 

Below  is  the  BNF  syntax  of  NADL+.  The  syntax  of  formulas  is  given  separately. 


(NADL+) 

::=  variables  (VarDecl)  {(VarDecl)} 
system  (ActionDecl)  {(ActionDecl)} 
environment  {(ActionDecl)} 
initially  (Formula)  [heu  :  (Number)] 
goal  (Formula)  [heu  :  (Number)] 

( VarDecl) 

::=  (VarType)  (IdLst) 

( VarType) 

::=  bool 

nat(  (Number)  ) 

(IdLst) 

::=  e 

(Id) 

(Id)  {,(Id)} 

( ActionDecl ) 

::=  (Id)  (TranDecl)  {(TranDecl)} 

( TranDecl ) 

::=  [dg  :  (Number)] 

[dh  :  (Number)] 
mod  :  (IdLst) 
pre  :  (Formula) 
eff  :  (Formula) 

[err  :  (Formula)] 

A.2.  NADL+ 
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variables 
nat(3)  pos 
bool  power 

system 
Right 
dg:  1 
dh:  1 
mod:  pos 

pre:  pos  <  7  A  power 
eff:  pos'  —  pos  +  1 

dg:  1 
dh:  0 
mod:  pos 

pre:  pos  <  7  A  ->  power 
eff:  pos'  —  pos 
Left 

dg:  1 
dh:  -1 
mod:  pos 
pre:  pos  >  0 
eff:  pos'  =  pos  —  1 
environment 
On 

mod:  power 
pre:  -> power 
eff:  power' 

Off 

mod:  power 
pre:  power 
eff:  -i  power' 
initially 

pos  =  0  A  power 

goal 
pos  =  7 

heu:  7 


Figure  A.  1 :  An  NADL 1  planning  problem. 
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An  identifier  is  a  sequence  of  numbers,  letters  and  the  character  that  does  not  begin 
with  a  number.  The  syntax  of  formulas  is  given  below.  The  —  >  operator  is  an  if-then-else 
operator.  The  relation  operator  <>  denotes  not  equal  to.  The  Boolean  operators  =>  and 
<=>  denote  logical  implication  and  bi-implication,  respectively.  The  other  operators  have 
their  usual  semantics. 

(Formula)  ( Formula )  —  >  (Formula)  ,  (Formula) 

(Formula)  ( BoolOp )  (Formula) 

(NumExp)  (RelOp)  ( NumExp ) 

(Formula) 

(  (Formula)  ) 
true 

|  false 

I  (Id) 


(BoolOp) 

■■■=  =>  1  <=>  1  /\  f 

V 

(RelOp) 

::=  =  <>!>< 

(NumExp) 

"=  (Id) 

(Number) 

(Number)  (Num 

Op)  (Number) 

(NumOp) 

::=  +  |  - 

A.3  Experimental  Setting 

All  experiments  presented  in  this  thesis  have  been  carried  out  with  BIFROST  version  0.7. 
However,  specialized  search  engines  have  been  implemented  for  the  channel  routing  exper¬ 
iments  and  experiments  with  the  ordinary  A*  algorithm  for  other  heuristics  than  HSPr.  All 
experiments  have  been  executed  on  a  Redhat  Linux  7.1  PC  with  kernel  2.4.16,  500  MHz 
Pentium  III  CPU,  512  KB  L2  cache  and  512  MB  RAM. 

PERL  scripts  are  used  to  execute  series  of  experiments.  The  results  of  these  experiments 
are  logged  in  BIFROST  experimental  files.  The  experimental  files  of  a  particular  domain 
are  collected  in  a  master  file  describing  the  complete  setup  of  the  experiments  for  future 
reference  and  reproduction.  The  master  file  also  contains  a  description  of  the  purpose  of 
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the  experiments  and  important  observations.  The  main  parameters  of  an  experiment,  in 
addition  to  the  problem  and  the  search  algorithm,  are 

1.  The  n  and  c  parameters  of  the  BuDDy  BDD-package,  where  n  is  the  number  of 
BDD-nodes  allocated  to  represent  the  shared  BDD,  and  c  the  number  of  BDD  nodes 
allocated  to  represent  BDDs  in  the  operator  caches  used  to  implement  dynamic  pro¬ 
gramming, 

2.  The  threshold  t  for  merging  partitions  in  the  disjunctive  partitioning  given  in  number 
of  BDDs  nodes  used  to  represent  the  partitions, 

3.  The  upper  bound  u  of  the  size  of  merged  BDDs  in  the  search  queue  of  the  state-set 
branching  algorithms. 

Time  is  measured  in  seconds.  The  size  of  a  BDD  is  equal  to  the  number  of  nodes  in  the 
BDD  graph. 
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Appendix  B 
Proofs 


This  appendix  contains  soundness,  completeness,  and  optimality  proofs.  The  proofs  for  the 
Weak,  StrongCyclic,  and  Strong  algorithms  are  partially  based  on  previous  proofs 
in  [34], 


B.l  Notation 

Most  algorithms  consist  of  an  initialization  of  a  set  of  variables  and  a  main  loop  that  assigns 
these  variables  and  possibly  a  set  of  variables  local  to  the  loop.  As  an  example  consider  the 
NDP  algorithm  introduced  in  Section  3.2.2  and  shown  below  in  Figure  B.l.  The  variables 

function  NDP(s0,  G) 

1  P  •<—  0;  C  •<—  G 

2  while  s0  ^  C 

3  Pc  <-  PreComp  (C) 

4  if  Pc  =  0  then  return  “no  solution  exists” 

5  else 

6  P  4-  P  U  Pc 

7  C  <-  C  u  States (Pc) 

8  return  P 


Figure  B.l:  The  NDP  algorithm  introduced  in  Section  3.2.2. 


of  NDP  that  are  initialized  outside  the  loop  and  assigned  to  inside  the  loop  are  P  and  C, 
while  Pc  is  a  local  variable  of  the  loop.  We  will  use  the  following  naming  conventions  for 
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statements  about  these  variables.  For  a  variable  V,  Vt  denotes  the  value  of  the  variable  after 
i  iterations  of  the  loop.  Thus,  for  NDP,  Cu  Pi,  and  PCi  denote  the  value  of  C%,  Pi,  and  PCi 
after  i  executions  of  the  code  in  line  2  to  7.  If  V  is  assigned  to  several  times  in  an  iteration 
of  the  loop,  Vi  refers  to  its  value  after  the  last  assignment.  If  V  is  initialized  before  the  loop 
then  V0  denotes  its  initial  value  before  the  first  iteration  of  the  loop.  If  V  is  a  local  variable 
of  the  loop  then  V0  is  undefined.  Thus,  for  NDP,  PCo  is  undefined.  Vt  is  said  to  exist  if  the 
loop  iterates  at  least  i  times. 


B.2  Additional  Defi  nitions 

This  section  contains  additional  definitions  used  in  the  proofs. 

Definition  B.l  (WD)  WDk(C)  =  {s  :  WDist (s,C)  =  k}. 

Definition  B.2  (SD)  SDk(C)  =  {s  :  SDist (s,C)  =  k}. 

Definition  B.3  (FixedPoint)  FixedPoint  (C)  is  a  set  ofSAs  defined  by  the  algorithm  be¬ 
low. 


function  FixedPoint  (C) 

1  F  •<—  0 

2  repeat 

3  Fold  ■*—  F 

3  F  PreImgS  A  (States  (F)  U  C)  \  C  x  Act 

4  until  F  =  FfM 
8  return  F 

Definition  B.4  (Adversarial  DAG)  An  adversarial  DAG  AD(C)  of  a  set  of  states  C  is  a 
graph  where  the  vertices  are  states  and  the  edges  are  system  actions  of  a  non-deterministic 
adversarial  planning  domain.  Each  state  q  in  an  AD(C)  is  associated  with  a  level  l(q).  An 
AD(C)  is  defined  inductively  as  follows 

•  c  €  C  are  terminal  states  of  AD(C)  with  1(c)  =  0, 

•  if  (l(-' ' '  q'n  are  states  in  AD(C)  and  there  for  each  applicable  environment  action 
Acte(q)  =  {ei,  •  •  • ,  en }  of  a  state  q  exists  a  counter  system  action  si,  ■  •  ■ ,  sn  in 
APP  <,(<?)  such  that  q  I  q)  for  1  <  i  <  n,  then  q  is  an  internal  state  of  AD  (C)  with 
outgoing  edges  Si,  ■  ■  ■ ,  sn  to  q[,  •  •  • ,  q'n.  The  level  of  q  is  l(q)  =  max"=1  l(qf)  +  1. 
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3  k 


Figure  B.2:  An  adversarial  DAG  of  a  set  of  states  {01,02,03,04}.  The  number 
shown  next  to  a  state  is  its  level. 


Notice  that  an  AD(C)  is  a  Directed  Acyclic  Graph  (DAG)  since  no  edge  of  a  state  at  level 
l  leads  to  a  state  at  level  /'  >  l.  For  a  state  q  in  AD(C),  let  qSA  denote  the  SSAs  formed  by 
pairing  q  with  its  outgoing  edges.  Similarly,  let  ADsa(C)  denote  the  union  of  the  SSAs  of 
each  state  in  AD (C). 

Example  B.l  Figure  B.2  shows  an  adversarial  DAG  AD{C )  of  a  set  of  states  C  =  {ci,  c2, 
c3,  c4}.  It  is  assumed  that  the  set  of  system  actions  is  Acts  =  {x.  y,  z}.  We  have  ADsa(C ) 
=  {(k,  x),  (k,  y },  (k,  z),  (q,  z),  (q,  y),  (q,  x ),  {v,  y),  (v,  x),  (p,  y),  (p,  z),  (s,  z),  (s,  x)}.  0 

Definition  B.5  (Weak  Marking)  A  weak  marking  ADw(q.  C )  of  an  adversarial  DAG 
AD(C)  is  a  subset  of  AD (C)  defined  by  marking  states  and  edges  in  AD(C)  reachable 
from  q. 

The  weak  marking  ADw(q ,  C )  is  undefined  if  q  is  not  a  state  of  AD(C). 

Example  B.2  Figure  B  .3  shows  the  weak  marking  of  the  adversarial  DAG  of  Example  B.l. 

0 

Definition  B.6  (Strong  Cyclic  Marking)  A  strong  cyclic  marking  ADsc(q,C )  of  an  ad¬ 
versarial  DAG  AD(C)  is  a  subset  of  AD(C)  defined  by,  recursively  from  q,  marking  each 
outgoing  edge  and  each  state  that  is  reachable  by  any  joint  action  made  up  by  an  applicable 
environment  action  and  a  system  counter  action. 

The  strong  cyclic  marking  ADsc(q ,  C )  is  undefined  if  a  state  not  in  AD(C)  needs  to  be 
marked. 
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Figure  B.3:  The  weak  marking  from  state  q  of  the  adversarial  DAG  shown  in  Fig¬ 
ure  B.2.  States  and  edges  in  the  marking  are  emphasized. 


Example  B.3  Figure  B.4  shows  the  strong  cyclic  marking  from  q  of  the  adversarial  DAG 
of  Example  B.l.  0 


B.3  NDP 

Let  Qi  be  the  set  of  states  for  which  a  solution  is  found  in  iteration  i  of  the  while  loop  of 
NDP.  That  is,  Q0  =  G  and  Qi  =  States  (PcJ  for  i  >  0. 

Lemma  B.l  Ci  =  (J*  =0  Qj. 

Proof.  This  follows  directly  from  C0  =  G,  Q0  =  G,  and  Ct  =  C,;_i  IJ  Q%  for  i  >  0.  □ 

Lemma  B.2  Qi  fl  Qj  =  0  for  i  j. 

Proof  Assume  without  loss  of  generality  that  i  >  j.  Then  by  Lemma  B.l  G,_i  D  Qj. 
By  the  definition  of  valid  precomponents,  we  have  G,_i  FI  PreComp  (Cj_i)  =  0  ,  which 
gives  Qi  n  Qj  =  0  .  □ 


Lemma  B.3  if  C*  exists  then  Ci  D  C,_i. 

Proof.  If  Ci  is  computed  then  Ct  =  G*_i  U  States (P(.()  and  States(PcJ  f  0  .  We  have 
PCi  =  PreComp  (Gj_i).  By  the  definition  of  valid  precomponents,  we  have  States  (PCi)  fl 
Cj_i  =  0  .  Thus,  Ci  A  Ci- 1.  □ 


Theorem  B.l  (Termination)  NDP  terminates. 
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Figure  B.4:  The  strong  cyclic  marking  from  state  q  of  the  adversarial  DAG  shown 
in  Figure  B.2.  States  and  edges  in  the  marking  are  emphasized.  Dashed  edges  are 
not  a  part  of  the  adversarial  DAG  but  are  only  used  for  marking.  These  edges  denote 
joint  actions  where  the  environment  action  is  paired  with  another  system  action  than 
its  counter  action.  Consider  state  q,  since  there  are  three  edges  from  this  state,  we  have 
| A.PPe (g) |  >  3.  In  the  figure,  we  assume  that  |APPe(</)|  =  3.  This  gives  a  total  of  9 
joint  actions  used  for  marking.  Only  3  of  these  are  environment  actions  paired  with 
their  counter  system  action. 


Proof.  By  the  definition  of  valid  precomponents,  the  function  PreComp  called  by  NDP 
terminates.  By  Lemma  B.3,  we  have  that  C,  D  C4_i  after  completion  of  iteration  i. 
However,  since  the  state  space  is  finite,  the  number  of  iterations  then  also  must  be  finite.  □ 


B.4  Strong 

Lemma  B.4  PreCompS  is  a  valid  precomponent  function. 

Proof.  Follows  directly  from  the  definition  of  PreCompS  and  PreImgSA.  □ 

Lemma  B.5  If  tt  =  PreCompS (C)  then  M( tt),  States^)  [=  AFC. 

Proof.  By  definition  of  PreComp(C),  tt  =  {(s, a)  :  s  G  C  A  Next(s,o)  Cl  C  / 
0  A  Next(s,  a)  n  C  =  0}.  Thus,  for  any  s  G  States  (tt),  we  have  for  each  s'  where 
(s,  s')  G  i?  of  M.(tt)  that  s'  G  C,  which  implies  M (7r),  s  |=  AF  C.  □ 


Lemma  B.6  For  Strong (s0,  G),  we  have  A 'l (Pi),  C*  |=  AF  G. 
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Proof.  By  induction  on  i. 

Case  i  =  0.  We  have  M.{  0 ),  G  |=  AF  G,  G0  =  G,  and  P0  =  0 .  Thus,  A4(Po),  G0  |=  AF  G. 
Case  i  >  0  The  induction  hypothesis  is  M(Pi-i),  Gj_i  |=  AF  G.  By  Lemma  B.5  and 
PCi  =  PreCompS  we  get  Ad(PCi),  States (PCi)  (=  AF  G,_i.  Combined  with  the 

induction  hypothesis,  we  get  A4(PCi  U  Pj_i),  States (Pc.)  U  G*_i  |=  AF  G  which  is  equal 
to  M(Pi),Ci  (=  AFG.  □ 

Theorem  B.2  (Soundness)  Strong  is  sound. 

Proof.  If  Strong (s0,  G)  returns  a  solution  tt  after  iteration  i  then  tt  =  P,  and  s0  £  G, . 
Thus,  by  Lemma  B.6,  AT  (7r),  s0  |=  AF  G.  □ 

Lemma  B.7  For  Strong (s0,  G),  we  /iave  s  e  Qi  SDist  (s,  G)  =  A 
Proof.  By  induction  on  i. 

Case  i  =  0.  By  definition  of  Q0  and  SDist,  we  have  s  e  Q0  ^  ^  G  SDist(s,  G)  = 

0. 

Case  i  >  0.  The  induction  hypothesis  is  s  G  Q3  SDist(s,  G)  =  j  for  j  <  i. 

“=>•”:  Assume  s  G  ft  -  By  definition  of  Qi  and  PreCompS,  we  get 

s  G  States  ({(s',  a')  :  s'  Gi_iANEXT(s',o')nGj_i  f  0  ANext(s',  a')  n  G*_i  =  0}). 

This  implies  that  there  exists  a  set  of  SAs  7r  such  that  Af(- zr),  s  \=  AF Gj_i.  Combining  this 
with  the  induction  hypothesis,  we  get  SDist(s)  <  i.  However,  if  SDist(s)  =  k  <  i  then 
s  G  Qk  according  to  the  induction  hypothesis.  This  is  in  conflict  with  the  assumption  that 
s  G  Qi.  Thus,  SDist(s,  G)  =  i. 

“^=”:  Assume  SDist(s,G)  =  i.  By  definition  of  SDist,  there  exists  an  action  a  such 
that  Vs'  G  Next(s,o)  .  SDist(s',G)  <  i  —  1.  Thus,  by  the  induction  hypothesis  and 
Lemma  B.l,  Next(s,  a)  C  G,_i.  It  must  be  the  case  that  s  ^  G*_ i  since  s  G  G,-_i  contra¬ 
dicts  the  induction  hypothesis.  But  then  by  definition  of  PreCompS,  s  g  States  (Pre¬ 
CompS  (Cj_i))  which  by  definition  of  Qi  gives  s  G  Qi.  □ 

Theorem  B.3  (Completeness)  Strong  is  complete. 

Proof.  If  a  valid  solution  exists  then  SDist (s0,  G)  f  oo.  Assume  SDist(s0,  G)  =  k. 
then  by  definition  of  SDist  there  exists  k  +  1  states  q0,-  •  • ,qk  such  that  SDist(^,  G)  =  i, 
qk  =  so.  and  g0  G  G.  Thus,  by  Lemma  B.7,  Qi  ^  0  for  i  <  k  and  s0  G  Q/t-  Since 
Q,  =  States (PCi),  we  have  Pc.  ^  0  for  i  <  k.  Thus,  Strong(s0,  G)  will  not  terminate 
with  failure  in  the  first  k  iterations.  Also  it  will  not  terminate  with  success  in  the  first 
k  —  1  iterations  since  by  Lemma  B.2,  s0  &  Qi  for  i  <  k.  However,  since  s0  G  Qk,  it  will 
terminate  with  success  in  iteration  k.  □ 
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Lemma  B.8  For  Strong (s0,  G),  we  have  Vs  E  Qi .  Max(s,  G,  Pi)  =  i. 

Proof.  By  induction  on  i. 

Case  i  =  0:  Let  s  E  Qo ■  By  definition  of  Qo,  s  E  G,  but  then  by  definition  of  Max, 
Max(s,  G,  0 )  =  Max(s,  G,  P0)  =  0 

Case  i  >  0:  The  induction  hypothesis  is  Vs  E  Qj  .  Max(s,  G,  Pj)  =  j  for  j  <  i.  Let 
s  E  Qi.  Then  by  definition  of  Qi  and  PreCompS  (Gj_i),  s  E  States  ({{s',  o')  :  s'  qL 
Ci-i  A  NEXT(s',a')  n  Cj_ i  ^  0  A  NEXT(s',a')  fl  G*_i  =  0}).  Thus,  by  definition 
of  PreCompS,  V(s',  a')  E  PCi .  Next(s',  a')  C  G*_i  A  Next(s',  o')  n  G,_i  f  0  .  We 
have  Pi  =  Pj_ i  U  PCi  and  s  €  States(PCj).  Further,  it  follows  from  the  definition 
of  Ci  and  Lemma  B.l  and  Lemma  B.2  that  P,_i  n  PCi  =  0.  Thus,  by  the  induction 
hypothesis  and  by  definition  of  Max,  Max(s,  G,  Pj)  <  V  However,  Max(s,  G,  Pj)  <  i 
implies  SDist(s,  G)  <  i  which  contradicts  Lemma  B.7,  since  we  assume  s  E  Qi.  Thus, 
Max(s,  G,  P,)  =  *.  □ 


Theorem  B.4  (Optimality)  If  it  is  a  solution  returned  by  Strong  (s0,  G)  then 

Max(s0,G,7t)  =  SDist(s0,G). 

Proof.  Combining  Lemma  B.7  and  Lemma  B.8  gives 

Vs  e  Qi .  Max(s,  G,  Pj)  =  SDist(s,  G) 


which  implies  the  result. 


□ 


B.5  Weak 

Lemma  B.9  PreCompW  is  a  valid  precomponent  function. 

Proof.  This  follows  directly  from  the  definition  of  PreCompW  and  PreImgSA.  □ 

Lemma  B.10  If  t\  =  PreCompW (G)  then  M( n),  States(7t)  |=  EFG. 

Proof.  By  definition  of  PreCompW  (G),  we  have  tt  =  { (s,  a)  :  s^CA  Next(s,  a)  fl 
G  f  0  }.  Thus,  for  any  s  E  States(7t),  we  have  3  s'  E  C .  (s,  s')  E  R  of  M{ f).  This 
implies  M (7r),  s  [=  EF  G.  Thus,  -M(7t),  States  (tt)  [=  EF  G.  □ 


Lemma  B.ll  For  Weak(s0,  G),  we  have  Af  (Pj),  Gj  |=  EF  G. 
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Proof.  By  induction  on  i. 

Case  *  =  0.  We  have  0  ),  G  f=  EF  G,  C0  =  G,  and  P*  =  0 .  Thus,  A4(Po),  C0  |=  EF  G. 
Case  i  >  0.  The  induction  hypothesis  is  M(Pi-i),  Cj_i  |=  EF  G.  By  Lemma  B.10,  the 
induction  hypothesis,  and  Pc.  =  PreCompW  (Cj_i),  we  get 

M(PCi  U  P*-i),  States(PCj.)  U  Ci- 1  h  EFG 


which  is  equal  to  M{Pi),  Ci  |=  EF  G.  □ 

Theorem  B.5  (Soundness)  Weak  is  sound. 

Proof.  If  Weak(s0,  G)  returns  a  solution  n  after  iteration  i  then  7r  =  P,  and  s0  e  Cj.  Thus 
by  Lemma  B.ll,  A4(n),  s0  [=  EF  G.  □ 

Lemma  B.12  For  Weak(s0,  G),  we  have  s  G  Qi  •<=>  WDlST  (s,  G)  =  i. 

Proof.  By  induction  on  i. 

Case  i  =  0.  By  definition  of  Q0  and  WDlST,  sG(3o^AsgG<(A  WDist(s,  G )  =  0. 
Case  i  >  0.  The  induction  hypothesis  is  s  G  Qj  •<=>  WDist(s,  G )  =  j  for  j  <  i. 

”=>•”:  Assume  that  s  G  Qi.  By  definition  of  Qi  and  PreCompW,  we  get 

s  e  States  ({(s',  o')  :  s'  Ci- 1  A  Next(s',o')  n  Ci- 1  ^  0}) 

.  Thus  by  the  induction  hypothesis  and  Lemma  B.l,  WDist(s,  G)  <  i.  However,  if 
WDist(s,G)  =  k  <  i  then  s  G  Qk  according  to  the  induction  hypothesis  which  is  in 
conflict  with  the  assumption  that  s  e  Qi.  Thus,  WDist(s,  G )  =  i. 

Assume  WDist(s,  G)  =  i.  Then  by  definition  of  WDlST,  3s',  a .  s'  G  Next(s,  a)  A 
WDist(s')  =  i  —  1.  Thus,  by  the  induction  hypothesis  and  Lemma  B.l,  s'  6  Cj_i.  It  must 
be  the  case  that  s  f  Ct-i  since  s  G  C1-\  contradicts  the  induction  hypothesis.  But  then 
by  definition  of  PreCompW,  s  g  States  (PreCompW  (Cj_i))  which  by  definition  of  Qi 
gives  s  G  Qi.  □ 

Theorem  B.6  (Completeness)  Weak  is  complete. 

Proof.  If  a  valid  solution  exists  then  WDist(s0,  G)  ^  oo.  Assume  WDist(s0,  G)  =  k. 
Then  by  definition  of  WDlST  there  exists  k  +  1  states  q0,  •  •  • ,  such  that  WDist(^)  =  i 
for  0  <  i  <  k,  qk  =  s0,  and  q0  G  G.  Thus  by  Lemma  B.12,  Qi  0  for  0  <  i  <  k,  and 
s0  G  Qk-  Since  Qi  =  States (PcJ,  we  have  PCi  ^  0  for  1  <  *  <  k.  Thus,  Weak(s0,  G) 
will  not  terminate  with  failure  in  the  first  k  iterations.  Also,  it  will  not  terminate  with 
success  in  the  first  k  —  1  iterations,  since  by  Lemma  B.2  so  ^  Qi  for  0  <  i  <  k.  However, 
since  s0  G  Qk,  it  will  terminate  with  success  in  iteration  k.  □ 
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Lemma  B.13  For  Weak(s0,  G),  we  have  Vs  £  Qi .  Min(s,  G,  Pi)  =  i. 

Proof.  By  induction  on  i. 

Case  i  =  0.  Let  s  £  Qo-By  definition  of  Q0,  we  have  s  £  G.  But  then  by  definition  of 
Min,  Min(s,  G,  0 )  =  Min(s,  G,  P0)  =  0. 

Case  i  >  0.  The  induction  hypothesis  is,  Vs  £  Q.,  .  Min(s,  G,Pj)  =  j  for  j  <  i.  Let 
s  £  Qi-  Then  by  definition  of  Qi  and  PreCompW (G*_i), 

s  £  States  ({(s',  a')  :  s'  ^  G,_i  A  NEXT(s',a')  n  C,_i  ±  0}). 

Thus,  there  exists  an  action  a  such  that  (s,  a)  £  PCi  and  Next(s,  a )  fl  C,_i  f  0  .  Since 
Pi  =  Pi_ i  U  Pc?;,  we  get  from  the  induction  hypothesis  that  Min(s,  G,  Pf)  <  i.  However, 
Min(s,  G,Pj)  <  i  implies  WDist(s,G)  <  i  which  contradicts  Lemma  B.12  since  we 
assume  that  s  £  Qi.  Thus,  Min(s,  G,  P*)  =  i.  □ 

Theorem  B.7  (Optimality)  If  n  is  a  solution  returned  by  Weak(s0,  G)  then 

Min(s0,  G,  7 r)  =  WDist(s0,  G). 

Proof.  Combining  Lemma  B.12  and  Lemma  B.13  gives 

Vs  £  Qi.  Min(s,  G,  Pi)  =  WDist(s,  G) 


which  implies  the  result. 


□ 


B.6  Strong  Cyclic 

Lemma  B.14  PreCompSC  is  a  valid precomponent  function. 

Proof.  By  inspection  of  PreCompSC  (G)  (1.4)  it  follows  that  if  (s,  a)  £  PreCompSC(G) 
then  a  £  App(s)  and  s  ^  G.  To  prove  that  PreCompSC  (G)  terminates,  we  first  ob¬ 
serve  that  PruneOutgoing  terminates  since  it  consists  of  a  single  preimage  computa¬ 
tion.  PruneUnconnected  must  also  terminate  since  NewSA  clearly  grows  in  each  iter¬ 
ation  and  the  number  of  SAs  is  finite.  To  prove  that  SCPlanAux  terminates,  we  have 
just  shown  that  PruneOutgoing  and  PruneUnconnected  terminates.  By  defini¬ 
tion  of  PruneOutgoing  and  PruneUnconnected  it  is  clear  that  SAi+1  C  SAi  in 
SCPlanAux.  Thus,  as  long  as  the  loop  in  SCPlanAux  continues,  we  have  SAi+i  c 
SAi.  Since  the  number  of  SAs  is  finite,  SCPlanAux  eventually  must  terminate.  Pre¬ 
CompSC  must  terminate  since  every  call  to  SCPlanAux  terminates  and  wSA  grows  in 
each  iteration.  Thus,  PreCompSC  can  only  complete  a  finite  number  of  iterations  since 
the  number  of  SAs  is  finite.  □ 
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Lemma  B.15  In  each  iteration  o/ PruneUnconnected,  we  have 

M(NewSAi),  States (NewSAi)  ^=EFC. 

Proof.  By  induction  on  i. 

Case  i  =  0.  NewSA0  =  0  which  trivially  fulfills  the  requirement. 

Case  i  >  0  The  induction  hypothesis  is  M(NewSAi- i),  States (NewSAi-i)  |=  EF  C.  We 
have 


NewSAi  C  PreImgSA(C  U  States (NewSA^)) 

c  {(s,  a)  :  Next(s,o)  n  (C  U  States (NewSA^i))  ^  0}. 

This  means  AA.{NewSAi)1  States  (NewSAi)  (=  EF  (C  U  States  (NewSA^i)).  Combined 
with  the  induction  hypothesis,  we  get  A4(NewSAi  U  NewSA^i),  States  (NewSAi)  U 
States  (NewSAi)  (=  EF  C.  Since  clearly  NewSAi  D  NewSAi_1 ,  we  have 

M(NewSAi),  States  (NewSAi)  \=  EF  C. 


□ 

Lemma  B.16  7/'SCPLANAux(s£art&4,  C)  returns  7 r  then  M( it),  States(7t)  |=  AGEF  C 

Proof.  By  inspection  of  SCPlanAux,  we  have 

7T  =  PruneUnconnected (PruneOutgoing (n,  c),  c). 

By  definition  of  PruneOutgoing,  if  (s,  a)  E  7rthen(s,a)  f  PreImgSA(C'U  States(7t)). 
Thus,  if  s'  E  C  U  States  (n)  then  ( s ,  s')  $  R  of  this  proves  that  any  execution  path 

in  Exec(^0,  7t)  where  q0  E  States(7t)  can  not  reach  a  state  outside  of  C  U  States(7t). 
However,  we  still  need  to  prove  that  there  is  an  execution  path  reaching  C  for  each  state  in 
7 r.  Since  tt  is  returned  from  PruneUnconnected,  we  have  7r  =  NewSAi  for  some  it¬ 
eration  of  PruneUnconnected.  From  Lemma  B.15,  we  then  get  AA( 7r),  States(7t)  (= 
AGEF  C.  □ 

Lemma  B.17  For  StrongCyclic(s0,  G),  we  have  M(Pi),  C%  |=  AGEF  G. 

Proof.  By  induction  on  i. 

Case  i  =  0.  We  have  AT(  0 ),  G  (=  AGEF  G,  C0  =  G,  and  P0  =  0  .  Thus,  M(Po),  C0  |= 
AGEF  G. 

Case  i  >  0.  The  induction  hypothesis  is  M (Pi-i),  Ci-i  [=  AGEF  G.  By  Lemma  B.16, 
M{PCi),  States (PCi)  \=  AGEF  i.  Combined  with  the  induction  hypothesis  this  gives 
M{PCi  U  P^ i),  States (PCi)  u  G- 1  |=  AGEF  G.  Which  is  equal  to  M{Pi ),  Ct  |=  AGEF  G. 

□ 
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Theorem  B.8  (Soundness)  StrongCyclic  is  sound. 

Proof.  If  StrongCyclic  (s0,  G)  returns  a  solution  tv  after  iteration  i  then  7 r  =  Pj  and 
s0  €  Cj.  Thus  by  Lemma  B.17,  M{ tv),  s0  \=  AGEF  G.  □ 


Lemma  B.18  If  AA{rv),  s0  |=  AGEF  G  and  G  C  C  then 

M{ 7T  \  C  X  Act),  s0  f=  AGEF  C. 

Proof.  Clearly  M( 7r),  s0  (=  AGEF  C  since  G  C  C.  In  addition,  we  can  remove  SAs  within 
G  from  7r  since  they  are  not  necessary  to  fulfill  M{rv),  s0  [=  AGEF  G.  Thus, 

M(tv  \  C  x  Act),  s0  |=  AGEF  G. 


□ 


Lemma  B.  19  If  M.{tv),  sq  |=  AGEF  G  then  there  exists  a  tv'  C  FixedPoint(G)  where 
M{tv'),So  |=  AGEF  G. 

Proof.  Let  P  denote  the  prefixes  of  any  execution  path  in  Exec(s0,  tv)  that  starts  in  s0 
and  ends  in  a  state  in  G.  Let  tv'  denote  the  SAs  associated  with  the  paths  in  P.  We  have 
M. {tv'),  so  |=  AGEF  G  since  otherwise  there  would  exist  an  execution  path  in  Exec(so,  7t) 
reaching  a  state  from  which  G  is  unreachable.  By  definition  of  PreImgSA,  we  have 
that  FixedPoint(G)  contains  any  SA  associated  with  any  finite  path  that  starts  from  a 
state  in  G  and  ends  at  a  state  in  G.  Thus,  due  to  the  definition  of  tv',  we  have  tv'  C 
FixedPoint(G).  □ 

Theorem  B.9  (Completeness)  StrongCyclic  is  complete. 

Proof.  By  contradiction.  Assume  that  M.(rv),  s0  (=  AGEF  G,  but  StrongCyclic  (s0,  G) 
terminates  in  iteration  i  with  “no  solution  exists”.  The  ith  call  to  PreCompS C  is  find¬ 
ing  a  strong  cyclic  precomponent  of  G,_  1.  Since  StrongCyclic  terminates  in  itera¬ 
tion  i,  we  must  have  that  PreCompSC(Cj_i)  =  0  .  Since  wSA  in  PreCompSC(Cj_i) 
is  updated  in  the  same  way  as  F  in  the  FixedPoint  algorithm,  there  must  exist  an  it¬ 
eration  k  of  PreCompSC(Gj_i)  where  wSAk  =  FixedPoint (G*_i).  Fet  tv'  =  tv  \ 
Gj_  1  x  Act.  Since  G  C  G,_  1,  Femma  B.18  gives  that  M.(rv'),  s0  [=  AGEF  G,_i.  But 
then  by  Femma  B.19,  there  exists  a  set  of  SAs  tv"  C  FixedPoint (G,_i)  =  wSAk 
such  that  A4( tv"),  s 0  |=  AGEF  G,_  1.  Consider  the  pruning  of  wSAk  in  SCPlanAux.  Ac¬ 
cording  to  the  proof  of  Femma  B.19,  tv"  can  be  chosen  such  that  it  has  no  SAs  leading 
out  from  States  (tv")  U  G*_  1,  and  any  SA  in  tv"  is  associated  with  an  execution  path 
connected  to  G*_  1.  Thus,  no  SAs  will  be  pruned  from  tv"  by  PruneOutgoing  and 
PruneUnconnected.  Consequently,  SCPlanAux  returns  a  non-empty  result  which 
in  turn  causes  PreCompSC  (C*_  1)  to  return  a  non-empty  result,  which  is  impossible.  □ 
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B.7  GNDP 

For  all  guided  non-deterministic  planning  algorithms,  we  will  assume  that  there  are  n  par¬ 
titions  of  the  disjunctive  branching  partitioning  used  by  the  algorithms.  Further,  recall  that 
for  a  map  M,  M  denotes  the  union  of  the  entries  in  M.  To  simplify  the  presentation,  we  as¬ 
sume  that  for  a  queue  Q,  the  symbol  Q  both  denotes  the  queue  and  the  union  of  the  entries 
in  it. 

Lemma  B.20  ifCi  exists  then  (7*  D 

Proof.  If  Cj  is  computed  then  by  line  6-7,  C,  =  Cj_i  U  States  (PcJ  where  States  ( PCi )  f- 
0  (otherwise  GNDP  terminates  in  line  4).  By  definition  of  valid  guided  precomponents, 
we  have  States (PCi)  n  i  =  0.  Thus,  67,  D  i.  □ 

Theorem  B.10  (Termination)  GNDP  terminates. 

Proof.  By  definition  of  valid  guided  precomponents,  the  function  GPreComp  called  by 
GNDP  must  terminate.  By  Lemma  B.20,  we  have  that  C*  D  Ci-i  after  completion  of 
iteration  i.  However,  since  the  state  space  is  finite  the  number  of  iterations  must  also  be 
finite.  □ 


B.8  Guided  Strong 

Lemma  B.21  For  GPreCompS  (C),  we  have 

|C|  n 

PreCompS  (G)  =  uu  PreCompS  ,  (C,  C[hj]) 
j= i  p-j 

Proof.  By  definition  of  PreCompS,, 

n  n 

[J PreCompS i(C,C[hj])  =  ^(PreImgSA^C^])  \  PreImgSA(G))  \  C  x  Act. 

i= 1  i= 1 

Thus,  by  definition  of  PreImgSA  and  the  fact  that  a  disjunctive  partitioning  contains  all 
the  transitions  of  the  transition  relation 

n  n 

(J  PreCompS  j(C,  C[/ij])  =  (J  {(s,a)  :  { s,a )  G  PreImgSA^C^])  A 

i= 1  i= 1 

(s, a)  <£  PreImgSA (C)  As  ^  C} 

=  {( s,a )  :  (s,  a)  e  PreImgSA (C[^-])  A 
(s,  a)  qL  PreImgSA (C)  A  s  Cj 
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From  this  we  get 


!Cj  n 

uu  PreCompS i(C,  C [hj]) 

j  1  i= i 


|C| 


{(s,  a)  :  (s,  a)e|J  PreImgSA(C[P,])  A  (s,  a)  PreImgSA(C)  As^F}  = 
j= 1 

{{s, a)  :  ( s , a)  G  PreImgSA(C)  A  (s, a)  £  PreImgSA(C)  As^C}  = 

PreCompS  (C). 


□ 


Lemma  B.22  GPreCompS  A  a  valid  guided  precomponent  function. 

Proof.  By  Lemma  B.21,  Q  contains  a  partitioning  of  a  strong  precomponent.  For  each 
partition,  the  correct  h- value  h  =  hj  —  Shi  is  associated  with  the  states  by  Insert.  Since  a 
map  with  the  top  node  of  Q  is  returned  by  GPreCompS  ,  the  output  from  GPreCompS  has 
the  correct  form.  Finally,  we  have  that  GPreCompS  terminates,  since  all  subcomputations 
terminates  and  the  loops  are  finite.  □ 

Lemma  B.23  7/Pc  =  GPreCompS (C)  then  M(PC ),  States(Pc)  |=  AFC. 

Proof.  From  Lemma  B.21,  we  have  that  Pc  C  PreCompS  ( C ).  But  then  it  follows  from 
the  proof  of  Lemma  B.5  that  Ad(Pc),  States (Pc)  [=  AF  C .  □ 

Lemma  B.24  For  GuidedStrong(s0,  G),  we  have  A4(Pj),  Ci  f=  AF  G. 

Proof.  Similar  to  Lemma  B.6  when  using  Lemma  B.23  instead  of  Lemma  B.5.  □ 

Theorem  B.ll  (Soundness)  GuidedStrong  is  sound. 

Proof.  If  GuidedStrong (s0,  G)  returns  a  solution  7 r  after  iteration  i  then  tt  =  P,  and 
s0  G  Ci.  Thus  by  Lemma  B.24,  M.{ 7r),  s0  |=  AF  G.  □ 

Lemma  B.25  IfC  D  Uto  SDt(W)  and  SDk(W )  \  C  f  0  then  PreCompS  (C)  ^  0. 
Proof.  Let  s  e  SDk(W)  \  C.  Since  s  e  SDk(W),  there  exists  an  action  a  where 

•  (s,a)  e  PreImgSA(C)  D  PreImgSAOJ^q1  SDt(W)), 

•  (s,a)  PreImgSA(C)  C  PreImgSA(U to  SDt(W)). 
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Since  also  s  ^  C,  we  get  by  definition  of  PreCompS  that  s  E  PreCompS  (C).  Thus 
PreCompS  (C)  ±  0 .  □ 

Lemma  B.26  If  SDk(G)  f  0  and  GuidedStrong(s0,  G)  returns  “no  solution  exists’’ 
then  there  exists  an  iteration  i  o/GuidedStrong  where  Uto  SDt{G)  C  Q. 

Proof.  By  induction  on  k. 

Case  k  =  0.  By  definition  of  SDlST,  SD0(G )  =  G.  Since  C0  =  G,  we  have  i  =  0. 

Case  k  >  0.  The  induction  hypothesis  is  that  if  SDk_i(G)  0  and  GuidedStrong  (s05 
G)  returns  “no  solution  exists”  then  there  exists  an  iteration  i'  of  GuidedStrong  where 
Uto  SDt(G)  C  CV. 

Assume  SDk(G )  f  0.  Then  by  definition  of  SDist,  SDk-i(G)  f  0.  Thus,  by 
the  induction  hypothesis,  GuidedStrong  will  not  terminate  before  an  iteration  i'  where 
ito  SDt(G)  C  C,. 

Consider  an  iteration  j  >  i! .  Let  R:j  =  SDk(G)  \  Cj  denote  the  states  in  SDk(G )  not 
covered  by  the  plan.  By  Lemma  B.25  and  Lemma  B.21,  the  queue  in  GPreCompS  can 
only  be  empty  if  Rj  =  0  .  Thus  at  some  iteration  i  >  i'  before  GuidedStrong  terminates 
with  “no  solution  exists”,  it  must  be  the  case  that  |Jt=o  SDt{G)  C  C%.  □ 

Theorem  B.12  (Completeness)  GuidedStrong  is  complete. 

Proof.  By  contradiction.  Assume  a  solution  exists,  but  GuidedStrong (sq,  G)  terminates 
with  failure.  Since  a  solution  exists,  we  have  SDist(s0,  G)  ^  oo.  Assume  SDist(s0,  G)  = 
k.  We  then  have  SDk(G)  0  .  Thus,  by  Lemma  B.26  GuidedStrong  will  continue  to 
an  iteration  i  where  SDk(G)  C  C\.  Since  so  £  SDk(G)  this  will  cause  GuidedStrong 
to  terminate  with  success,  which  is  impossible.  □ 


B.9  Guided  Weak 

Lemma  B.27  For  GPreCompW  (C),  we  have 

!C!  n 

PreCompW(C)  =  IJ  (JpreCompW^C,  C[hj]) 

j  i  *=i 

Proof.  Similar  to  Lemma  B.21.  □ 


Lemma  B.28  GPreCompW  is  a  valid  guided  precomponent  function. 
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Proof.  By  Lemma  B.27,  Q  contains  a  partitioning  of  a  weak  precomponent.  For  each  par¬ 
tition,  the  correct  /j-value  h  =  hj  —  Shi  is  associated  with  the  states  by  Insert.  Since  a 
map  with  the  top  node  of  Q  is  returned  by  GPreCompW,  the  output  from  GPreCompW 
has  the  correct  form.  Finally,  we  have  that  GPreCompW  terminates,  since  all  subcompu¬ 
tations  terminates  and  the  loops  are  finite.  □ 

Lemma  B.29  7/Pc  =  GPreCompW (C)  then  M{PC),  States (Pc)  |=  EFC. 

Proof.  From  Lemma  B.27,  we  have  that  Pc  C  PreCompS  ( C ) .  But  then  it  follows  from 
the  proof  of  Lemma  B.10  that  A4(PC),  States(Pc)  (=  EF  C.  □ 

Lemma  B.30  For  GuidedWeak(s0,  G ),  we  have  3W(Pj),  Ci  \=  EF  G. 

Proof.  Similar  to  Lemma  B.  1 1  when  using  Lemma  B.29  instead  of  Lemma  B.  10.  □ 

Theorem  B.13  (Soundness)  GuidedWeak  is  sound. 

Proof.  If  GuidedWeak(s0,  G)  returns  a  solution  7r  after  iteration  i  then  tt  =  P,  and 
s0  G  Ci.  Thus  by  Lemma  B.30,  M{ 7r),  s0  |=  EF  G.  □ 

Lemma  B.31  IfC  D  USt  WDt(W)  and  WDk{W )  \  C  ±  0  then 
PreCompW(C )  f  0. 

Proof.  Let  s  G  WDk(W )  \  C.  Since  s  €  WDk[W),  there  exists  an  action  a  where 

k- 1 

(s, a)  e  PreImgSA(G)  D  PreImgSA((J  WDt(W)) 

t= o 

Since  also  s  C,  we  get  by  definition  of  PreCompW  that  s  G  PreCompW  (C).  □ 

Lemma  B.32  If  WDk(G )  0  and  GuidedWeak(so,  G)  returns  “no  solution  exists” 

then  there  exists  an  iteration  i  of  GUIDEDWEAK  where  Ito  WDt(G)  C  Q. 

Proof.  By  induction  on  k. 

Case  k  =  0.  By  definition  of  WDlST,  WD0(G)  =  G.  Since  C0  =  G,  we  have  *  =  0. 
Case/c  >  0.  The  induction  hypothesis  is  that  if  WDk_i(G)  0  and  GuidedWeak (s0,  G) 

returns  “no  solution  exists”  then  there  exists  an  iteration  i'  of  GuidedWeak  where 
Ito1  WDt{G)  C  CV. 

Assume  WDk(G )  0.  Then  by  definition  of  WDlST,  WDk_i(G)  f  0.  Thus,  by 
the  induction  hypothesis,  GuidedWeak  will  not  terminate  before  an  iteration  i'  where 
Uf“o  WDt(G)  C  CV. 
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Consider  an  iteration  j  >  i' .  Let  Rj  =  WD k(G)  \  Cj  denote  the  states  in  WDk{G)  not 
covered  by  the  plan.  By  Lemma  B.31  and  Lemma  B.27,  the  queue  in  GPreCompW  can 
only  be  empty  if  Rj  =  0  .  Thus  at  some  iteration  i  >  i'  before  GuidedWeak  terminates 
with  “no  solution  exists”,  it  must  be  the  case  that  |Jt=o  WDt(G)  C  C\.  □ 

Theorem  B.14  (Completeness)  GuidedWeak  is  complete. 

Proof.  By  contradiction.  Assume  a  solution  exists,  but  GuidedWeak(s0,  G)  terminates 
with  failure.  Since  a  solution  exists,  we  have  WDist(s0,  G)  oo.  Assume  WDist(s0,  G) 
=  k.  We  then  have  WDk(G)  ^  0  .  Thus,  by  Lemma  B.32  GuidedWeak  will  continue  to 
an  iteration  i  where  WDk{G )  C  C*.  Since  s0  G  WDk(G )  this  will  cause  GuidedWeak 
to  terminate  with  success,  which  is  impossible.  □ 


B.10  Guided  Strong  Cyclic 

Lemma  B.33  If  \Q\  =  0  in  iteration  i  of  the  repeat  loop  of  GPreCompSC(C)  then 
wSAi  =  FixedPoint(C). 

Proof. 

“  C  ”.  Assume  that  (s,  a)  G  wSA^.  By  following  the  parent  nodes  in  the  search  tree  im¬ 
plicitly  represented  by  Q,  we  get  that  (s,  a)  lies  on  a  path  sp'p"  ■  ■  •  connected  to  C  where 
s  -dp'.  Thus,  by  definition  of  FixedPoint(C'),  (s,a)  G  FixedPoint(C'). 

“  D  ”.  Assume  (s,  a)  G  FixedPoint(G).  Thus,  ( s ,  a)  lies  on  a  path  connected  to  C.  As¬ 
sume  without  loss  of  generality  that  this  path  is  pm  ■  ■  ■  p0  where  pm  =  s  and  p0  G  C.  Before 
the  first  iteration  of  the  repeat  loop,  Lemma  B.27  gives,  q\  G  States  (Q).  But  then  Q  can 
not  be  empty  before  qi  is  expanded.  If  q2  f  STATES  (wSA)  before  q1  is  expanded  then  q2 
is  inserted  in  Q  when  (p  is  expanded.  Otherwise  another  node  has  inserted  q2  in  Q  before 
qi  was  expanded.  In  both  cases,  Q  can  not  be  empty  before  q2  is  inserted.  Applying  this 
argument  inductively,  we  get  that  Q  is  not  empty  before  pm- i  has  been  inserted.  Thus,  at 
some  point  before  iteration  i  of  the  repeat  loop,  (s,  a)  G  wSA.  □ 

Lemma  B.34  GPreCompSC  is  a  valid  guided  precomponent  function. 

Proof.  Lemma  B.33  gives  that  GPreCompSC  (C)  returns  a  map  with  valid  SAs  of  C.  By 
inspection  of  GPreCompSC,  we  see  that  the  Insert  function  associates  states  in  Q  with 
their  correct  h- value.  Thus,  all  states  in  STATES  (wSA)  are  associated  with  their  correct 
/r-value  by  wS  in  line  13.  By  inspection  of  line  21  to  22,  we  therefore  get  that  the  map 
of  SAs  returned  by  GPreCompSC  (C)  associates  the  states  of  the  SAs  with  their  correct 
h- value. 
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To  prove  that  GPreCompSC  (C)  terminates,  we  first  observe  that  line  1  to  6  termi¬ 
nates  since  the  loops  are  finite.  The  repeat  loop  must  also  terminate,  since  in  the  proof  of 
Lemma  B.33  wSA  reaches  a  max  size  at  some  point,  due  to  the  finite  number  of  SAs,  such 
that  no  new  nodes  are  inserted  on  Q.  □ 

Theorem  B.15  (Soundness)  GuidedStrongCyclic  is  sound. 

Proof.  This  follows  from  Lemma  B.16  and  an  adoption  of  Lemma  B.17.  □ 

Theorem  B.16  (Completeness)  GuidedStrongCyclic  is  complete. 

Proof.  By  contradiction.  Assume  that  A4(7r),  s0  i=  AGEFG,  but  GuidedStrongCyc¬ 
lic  (s0,  G)  terminates  in  iteration  i  with  “no  solution  exists”.  The  i\h  call  to  GPreCompSC 
is  finding  a  strong  cyclic  precomponent  of  Since  GuidedStrongCyclic  termi¬ 
nates  in  iteration  i,  we  must  have  GPreCompSC  (Cj_i)  =  0  .  This  is  only  possible  if 
\Q\  =  0  in  some  iteration  of  the  repeat  loop.  But  then  by  Lemma  B.33,  we  have  that 
prior  to  this  wSA  =  FixedPoint(C)_i).  We  have  shown  in  the  completeness  proof  of 
StrongCyclic  that  this  leads  to  SCPlanAux(w;&4)  ^  0,  which  is  impossible  since 
GPreCompSC  then  would  return  with  a  non-empty  map.  □ 


B.ll  Weak  Adversarial 

Lemma  B.35  PreCompWA  is  a  valid  precomponent  function. 

Proof.  This  follows  from  PreCompWA(C')  C  PreImgSSA(C)  \  C  x  Acts  and  the  fact 
that  both  PreImgSSA  and  FAIRS  TATES  terminates.  □ 

Lemma  B.36  Ifirs  is  returned  by  PreCompWA  (C)  then 

V7 Te  G  Ff+  .  A4(7Ts,  7Te),  STATESs(7Ts)  |=  EF  C. 

Proof.  By  definition  of  PreCompWA  (C),  we  have 

7T s  =  {( s,as )  :  (s,  as)  G  PreImgSSA (C)  A  s  ^  C  A 

Vae  G  Appe(s) .  3as  G  Acts(PreImgSSA(G)  \  C  x  ActSl  s),  s'  e  C  .s  ^  s'}. 

Let  s  G  States 8(715)  and  let  7re  G  11+ .  Then  by  definition  of  n+,  we  have  0  C 
ACTe(7 re,  s )  C  App6(s).  Assume  ae  G  ACTe(7re,  s).  Then  by  definition  of  irs  there  exists 

a  counter  system  action  as  such  that  (s,  a s)  G  tt  and  s  c  where  c  G  C.  By  definition  of 
R  of  7re),  we  get  (s,  c)  G  R.  But  then  clearly  M( 7TS,  7Te),  s  (=  EF  C.  □ 
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Lemma  B.37  For  WeakAdversarial(s0,  G),  we  have 

VTreent.M(Pi,ne),Ci\=EFG. 

Proof.  By  induction  on  i. 

Case  i  =  0.  For  any  7re  G  11+ ,  we  have  M(  0 , 7re),  s  \=  EF  G  if  s  G  G.  Thus,  since  Co  —  G 
and  P0  =  0  ,  we  get  V7Te  G  11+  .  M(Po ,  ne),  Co  |=  EF  G. 

Case  i  >  0.  The  induction  hypothesis  is  V7Te  G  11+  .  M(Pi-i,  7Te),  Ci-i  |=  EF  G.  From 
LemmaB.36  and  PCi  =  PreComp WA(Gj_i),  we  get  V7re  G  11+  .  M(PCi,  7te),  STATESs(PCi) 
|=  EF  Ci- 1-  Thus,  by  the  induction  hypothesis,  it  must  hold  that  V7re  G  11+  .  A4(PCi  U 
Pj_i,7re),  States  s(PCi)  U  Ci- 1  |=  EF  G,  which  is  equal  to  V7re  G  11+  .  A4(Pi,  7re),  (7*  |= 
EFG.  □ 

Theorem  B.17  (Soundness)  WeakAdversarial  A  sound. 

Proof.  If  WeakAdversarial  (so,  G)  returns  a  solution  tts  after  iteration  i  then  ns  =  Pi 
and  sq  E  Ci.  Thus,  by  Lemma  B.37  V7re  G  11+  .  7rs,  7re),  s0  |=  EFG.  □ 

Lemma  B.38  If  there  exists  a  system  plan  ns  where 

V7Te  G  n+  .  X(7TS,  7Te),  S  |=  EF  G 
t/ic/7  there  exists  a  weak  marked  adversarial  DAG  ADw(s,  C ). 

Proof.  If  s  G  G  then  ADw(s,  C )  is  the  single  terminal  state  s.  Otherwise  assume  s  C . 
Since  G  can  be  reached  from  s  for  any  non-empty  environment  plan  there  must  exist  a  set 
of  finite  prefixes  P  of  execution  paths  in  (J^en^  Exec(s,  7ts,  7re)  that  reaches  a  state  in  G. 
In  addition,  P  can  be  chosen  such  that 

1 .  from  each  state  v  on  a  path  in  P  visited  prior  to  a  state  in  G,  a  counter  system  action 
for  each  applicable  environment  action  Appe(p)  exists,  and 

2.  all  system  actions  associated  with  paths  in  P  are  counter  actions. 

The  first  property  is  fulfilled  since  no  non-empty  environment  plan  exists  making  G  un¬ 
reachable.  The  second  property  holds  since,  in  order  for  G  to  be  reachable,  it  is  sufficient 
only  to  apply  a  single  counter  action  for  each  environment  action  that  may  transition  to  a 
state  closer  to  G. 

Let  the  level  of  a  state  q  on  one  of  these  paths  be  defined  by  the  maximum  distance  from 
q  to  G  for  any  of  the  paths  visiting  q.  The  paths  then  form  a  weak  marking  ADw(s ,  G)  of 
an  adversarial  DAG  AD(C )  of  G.  □ 
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Theorem  B.18  (Completeness)  WeakAdversarial  is  complete. 

Proof.  By  contradiction.  Assume  that  a  system  plan  7rs  exists  such  that  V7Te  G  11+  .  MfnSl 
■fie),  so  (=  EF  G  and  WeakAdversarial  (so,  G)  terminates  with  “no  solution  exists”. 
By  Lemma  B.38,  a  weak  marked  adversarial  DAG  ADw(s0,  G )  exists.  By  the  definition 
of  ADw(s0,G)  we  have  that  all  states  at  level  i  are  fair  with  respect  to  their  applicable 
system  actions  and  the  states  at  level  j  <  i.  Thus,  since  PreCompWA  prunes  unfair 
states  from  the  preimage  of  C,  all  states  in  AL)y,(s0.  G)  at  level  1  are  in  C\ .  all  states 
at  level  2  are  in  C2  etc..  Hence,  if  C  has  reached  a  maximum  size  (which  must  happen 
since  Weak  terminates  with  failure)  then  C  includes  all  states  in  ADw(so ,  G).  However, 
then  WeakAdversarial  returns  with  success  since  s0  £  WeakAdversarial (s0,  G), 
which  is  impossible.  □ 


B.12  Strong  Cyclic  Adversarial 

Lemma  B.39  Let  Si  be  defined  recursively  by 

50  =  0, 

51  =  SA  n  FairStates(&4,C  U  Statess(S)_i))  x  Acts. 

then  Si+ 1  D  St. 

Proof.  By  induction  on  i. 

Case  i  =  0.  Trivial  since  So  =  0  . 

Case  i  >  0.  The  induction  hypothesis  is  Si  D  Si- 1.  We  have 

Si+ 1  =  SA  n  FairStates(5'A,C'  U  States s (Si))  x  Acts. 

Thus,  by  definition  of  FAIRS  TATES 

Si+1  =  SA  n  {s  :  Mae  G  App6(s)  .  3os  G  Acts(&4,  s), 
s'  G  C  U  States^S1,)  .s  s'}  x  Acts. 

The  induction  hypothesis  is  St  D  Si- 1  which  means 

Si+ 1  D  SA  fl  {s  :  Vae  G  Appe(s) .  3 as  G  Acts(&4,  s), 
s'  e  C  u  States s(>Sj_i)  .s  s'}  x  Acts 
=  St. 


□ 
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Lemma  B.40  PreCompSCA  is  a  valid  precomponent  function. 

Proof.  By  inspection  of  PreCompSC  A(C),  we  have 

PreCompSC  A(C)  C  FixedPoint(C'). 

Thus,  if  (s,  a)  E  PreCompSCA  (C)  then  a  E  App(s)  and  s  ^LC .  To  prove  that  PreComp- 
SCA(C)  terminates,  we  observe  that  the  only  difference  between  PreCompSC  (C)  and 
PreCompSCA (C)  is  that  the  subfunction  PruneUnconnected  has  been  substituted 
with  PruneUnfair  .  We  can  therefore  reuse  the  proof  for  termination  of  PreCompSC  (C) 
except  that  we  need  to  show  that  PruneUnfair  terminates.  Assume  that  PruneUnfair 
diverges.  By  Lemma  B.39  NewSAi+ 1  D  NewSAi.  However,  since  PruneUnfair  di¬ 
verges,  we  must  have  NewSAi+i  D  NewSAi  for  i  >  0,  which  is  impossible  since  the 
number  of  SSAs  is  finite.  □ 

Lemma  B.41  In  each  iteration  i  of  P  R  u  N  E  U  N  FA  l  R  ( SA .  C),  we  have 

We  E  n+  .  M.(NewSAi,  7re),  States s (NewSAi)  \=  EF  C. 

Proof.  By  induction  on  i. 

Case  i  =  0.  NewSA0  =  0  which  trivially  fulfills  the  requirement. 

Case  i  >  0.  The  induction  hypothesis  is 

We  E  n+  .  M(NewSAi_u  ire),  States s(NewSAi_i)  \=  EF C. 

We  have 

NewSAi  =  SA  n  FairStates(&4,  C  U  States s(iVetp5'Aj_i))  x  Acts 
Thus  by  definition  of  FAIRS  TATES 

NewSAi  =  SA  n  {s  :  Vae  e  Appe(s) .  3as  E  Act s(SA,s), 
s'  eC  U  States s(NewSAi_1)  .s  s'}  x  Acts. 

Let  s  E  States s(NewSAi)  then  obviously 

V7re  E  Ilj" .  Ai(NewSAi ,  7re),  s  |=  EF  C  U  States (NewSA^i). 

Thus, 

V7re  E  n+  .  M(NewSAi,7Te),  States s(NewSAi)  ^  EFC  U  States (NewSA^). 

Combined  with  the  induction  hypothesis,  we  get  We  E  n+  .  M(NewSAi  U  NewSAi- 1, 7re), 
States s(NewSAi)  U  States s(NewSAi_1)  |=  EF  C.  Thus,  since  by  Lemma  B.39  NewSAi 
D  NewSA^i 

\/Tte  E  n+  .  A4(NewSAi,  ne),  States s(NewSAi)  \=  EF  C. 

□ 
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Lemma  B.42  If  SC  APlanAux  (start? A,  C )  returns  tts  then 

We  G  n+  .  M(ns,  7Te),  StATES(7Ts)  |=  AGEF  C. 

Proof.  By  inspection  of  SCAPlanAux,  we  have 

7TS  =  PRUNEUNFAIR (PRUNEOUTGOING (7TS,  C),  C). 

By  definition  of  PruneOutgoing,  if  (s,  as)  G  7rs  then 

(s,  as)  ^  PreImgSSA(C  U  States,^)). 

Thus,  for  all  7re  G  IT+  if  s'  G  C  U  STATESs(7rs)  then  (s,  s')  ^  R  of  -A4(7TS,  7re).  This 
means  that  no  execution  path  in  Exec(50,  7ts,  vre)  where  g0  G  STATESs(7rs)  can  reach  a 
state  outside  of  C  U  Statess(7ts).  We  still  need  to  show  that  for  all  7re  G  11+  there  is  an 
execution  path  reaching  C  for  each  state  in  7rs.  However,  this  follows  from  Lemma  B.41, 
since  there  exists  an  i  such  that  tts  =  NewSAi  of  PruneUnfair.  From  the  above,  it 
follows 

Vvre  G  n+  .  M( TTSl  7 re),  Statess(tts)  |=  AGEF  C. 

□ 


Lemma  B.43  For  StrongCyclicAdversarial(s0,  G),  we  have 

VTTe  G  n t  7Te),  Q  |=  AGEF  G. 

Proof  By  induction  on  i. 

Case  i  =  0.  We  trivially  have 

We  G  n+  .  M{Po,  7 Te),g\=  AGEF  G 


for  g  G  G.  Thus, 


V7Te  G  n+  .M(P0,7Te),  C0  |=  AGEF  G. 


Case  i  >  0.  The  induction  hypothesis  is 

We  G  n+  .  M(Pi-U  TTe),  Q-l  |=  AGEF  G. 

From  Lemma  B.42  and  PCi  =  PreCompSCA(Gj_i),  we  get 

V7Te  G  n+  .  M(PCi,7re),  States s(PCi)  |=  AGEF Cj-i. 
Combined  with  the  induction  hypothesis,  we  get 

Vvre  g  n+.M(PCi  U  Pi-!,  7Te),  States  s(PCi)  u  j=  AGEF  G. 
Which  is  equal  to 


We  G  n+  .  M(Pi,  TTe),  Ci  h  AGEF  G. 


□ 
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Theorem  B.19  (Soundness)  StrongCyclicAdversarial  is  sound. 

Proof.  If  StrongCyclicAdversarial (s0,  G)  returns  a  solution  7 rs  after  iteration  i  then 
irs  =  Pi  and  s0  G  C',.  Thus,  by  Lemma  B.43  Vire  G  11+  .  A4(tts,  7Te),  «o  \=  AGEF  G.  □ 

Lemma  B.44  If  there  exists  a  system  plan  tts  where 

VvTe  G  n+  .  A4(7Ts,7Te),S  |=  AGEF  C 
then  there  exists  a  strong  cyclic  marked  adversarial  DAG  ADsc(s,  C). 

Proof.  If  s  G  C  then  AD Sc{s,  C)  is  the  single  terminal  state  s.  Otherwise,  assume 
s  ^  C .  Let  V  denote  the  set  of  states  that  can  be  visited  by  any  execution  path  in 
lUen+ExEC(s,  7rs,  ne)  prior  to  a  state  in  C.  For  each  state  in  v  G  V,  we  have  Vne  G 
n+  .  M.{j ts,  7 re),  v  |=  AGEF  C.  Thus,  it  must  be  possible  to  define  a  counter  system  action 
for  each  applicable  environment  action  Appe(p)  such  that  a  set  of  finite  paths  reaching  C 
exists  where  the  states  of  the  paths  are  in  V  and  the  associated  system  actions  are  counter 
actions.  Let  the  level  of  a  state  q  on  one  of  these  paths  be  defined  by  the  maximum  distance 
from  q  to  C  for  any  of  the  paths  visiting  q.  The  paths  can  then  be  represented  by  a  DAG 
which  is  a  subset  of  an  adversarial  DAG  AD(C )  of  C.  Since  the  states  of  this  DAG  is  V 
and  s  G  V,  a  strong  cyclic  marking  of  the  DAG  exists  that  is  equal  to  a  valid  strong  cyclic 
marking  ADsc(s,  C )  of  AD(C).  □ 

Lemma  B.45  If  a  strong  cyclic  marked  adversarial  DAG  ADsc(s,  C )  exists  and 
ADs^(s,C)  C  SA  then  AD^(s,  C)  C  PruneUnfair(5A,  C). 

Proof.  By  inspection  of  PruneUnfair(5A,  C ),  it  follows  that  all  SSAs  of  a  state  s  in  SA 
is  included  in  NewSAi+1  if  s  is  fair  with  respect  to  the  SSAs  associated  with  s  and  the  states 
in  NewSAi.  Since  all  the  states  at  level  0  in  AD‘^(s.  C )  are  in  C.  we  get  that  all  SSAs  of 
states  at  level  1  are  included  in  NewSAi.  Similarly,  in  iteration  2,  we  get  that  all  SSAs  of 
states  at  level  2  are  included  in  NewSA2  and  Pru N lUn FA l  R  (SA .  C)  does  not  terminate  in 
iteration  1  if  some  states  at  level  2  are  not  in  S TAT F S  s ( Ne wSA j ) .  Thus,  by  induction,  an 
iteration  k  is  reached  where  AD C)  C  NewSAk.  □ 

Theorem  B.20  (Completeness)  StrongCyclicAdversarial  is  complete. 

Proof.  By  contradiction.  Assume  that  V7re  G  11+  .  .A4(7rs,  7re),  s0  1=  AGEF  G,  but  Strong- 
CyclicAdvers ARIAl(s0  ,  G )  terminates  in  iteration  i  with  “no  solution  exists”.  The  fih 
call  to  PreCompSCA  is  finding  a  strong  cyclic  adversarial  precomponent  of  67*_  1.  Since 
StrongCyclicAdversarial  terminates  in  iteration  i,  we  must  have  PreCompSCA( 
Ci- 1)  =  0  .  Since  V7re  G  11+  .  A4(7rSl  7re),  «o  |=  AGEF  G,  we  also  have 

V7Te  G  n+  .  M( 7TS,  7 Te),  S0  \=  AGEF  Cj_i 
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since  by  Lemma  B.3  Cj_i  D  G.  Thus,  by  Lemma  B.44  a  strong  cyclic  marked  adversarial 
DAG  ADsc(sq,  Ci-i)  exists.  We  have  from  the  completeness  proof  of  StrongCyclic  that 
in  some  iteration  of  PRECOMPSCA(C'j-i),  wSA  =  FlXEDPoiNT(C'j-i).  From  the  defini¬ 
tion  of  ADsc(s0,  Ci-i),  it  is  clear  that  ADs^(s0,  C,_i)  C  FlXEDPoiNT(C'j-i).  Consider 
the  pruning  of  wSA  in  SCAPlanAux.  No  SSAs  in  AD^(s0,  Cj_i)  will  be  pruned  by 
PruneOutgoing  since  ADssf(s0,  Cj_ i)  has  no  SSAs  leading  out  from  States  (AD^(s0 
,  Cj_i))  U  Ci-i.  In  addition,  by  Lemma  B.45  no  SSAs  in  AD^(s0,  Cj_i)  will  be  pruned 
by  PruneUnfair(S'A,  Ci-i)  since  AD^(s0,  Cj_i)  C  SA.  Thus  SCAPlanAux  and 
PreCompSCA  return  a  non-empty  result,  which  is  impossible.  □ 


